Coronavirus COVID-19 Cases in Finland

Bernardo Di Chiara, Data Analyst

http://fi.linkedin.com/in/bernardodichiara

Last full updates to the comments: June 16th 2020

Last plotted day: see the end of this file

Table of Contents

1. Executive Summary
....1.1. References
2. Setup
3. Defining the Needed Functions
....3.1. Dataframes and Lists Handling
....3.2. Plots
....3.3. Project-specific Functions
4. Dumping and Collecting the Data
5. Data Analysis
....5.1. Summary
....5.2. Preliminary Data Analysis
....5.3. Data Cleansing
....5.4. Data Preparation
............5.4.1. New datasets with no NaN, no GPS coordinates / list of days / list of Countries
............5.4.2. Population age data
............5.4.3. World Data
............5.4.4. Finnish Data
............5.4.5. Data from other Scandinavian Countries and Estonia
............5.4.6. Data from other European Countries
............5.4.7. Data from UK and US
............5.4.8. Data from Brazil, Russia and India
............5.4.9. Data from China
....5.5. Summary of the Created Datasets
6. Domain-Specific Concepts
7. Data Visualization
....7.1. Overview
............7.1.1. General Comments to the Plots
............7.1.2. A Reference Curve Set
....7.2. Finnish Internal Situation
....7.3. Comparison with the Closest Neighboring Countries
............7.3.1. Comparison with Other Scandinavian Countries and Estonia
....7.4. Comparison with other European Countries
....7.5. Situation in China
....7.6. Situation in Italy
....7.7. UK and US
....7.8. Brazil, Russia and India
....7.9. Normalizing by Country population
............7.9.1. List of Variables Affecting Potentially the Curves
............7.9.2. Confirmed Cases: Summary of Findings from the Analysis
............7.9.3. Deceased Cases: Summary of Findings from the Analysis
....7.10. Demographic Considerations
....7.11. World View
............7.11.1. Lethality
8. Statistics
....8.1. World view
....8.2. Top Ten Countries
....8.3. Finland
9. Conclusions
10. Acknowledgements

1. Executive Summary

This notebook contains visualizations related to the spread of the Coronavirus COVID-19 with a focus on Finland.

The data is taken from the Johns Hokpins University (JHU) /1/.

There are a few good dashboards in the Web about this topic (for example, by Johns Hokpins University /2/ and by Tableau /3/). In addition, there is a good site with latest information about Finland broken down by Region /4/. Another very useful source of information is the European Centre for Disease Prevention and Control /5/. Still, it might be beneficial to manipulate the data in order, for example, to compare Finnish curves with curves from other Countries.

Having updated charts is very useful both for authorities and for the population in order to make fact-based decisions that help to contain the positive cases so not to overload the hospitals and therefore minimizing the casualties.

Comparing Finnish curves to those of neighboring Countries might provide useful insights since, in addition to the geographical proximity and similar weather, those Countries have certain similarities in culture, behavior patterns and may be genetics.

Sections from 2 to 5 contain mostly code which is needed to define the used functions and to dump, cleanse and prepare the data.

General domain specific concepts are contained in section 6. An overview chapter containing a description of the plots and the illustration of a reference case is contained at the beginning of section 7.

Line plots containing confirmed cases each day as well as recovered and deceased cases have been produced. The active cases have been shown in the same plot.

Other plots containing the new confirmed daily cases, which shows the speed at which the virus is spreading, have been added as well. Daily increments have been plotted also for the deceased and the active cases.

Finnish curves have been compared to the curves of the other Scandinavian Countries as well as few other European Countries. Curves of UK, US, Brazil, Russia and India have been plotted as well.

Plots showing the number of confirmed cases per capita have been created to eliminate the population variable from the comparisons. Other plots have been created to normalize by the density of the population.

Finally, plots with worldwide data have been produced. This includes also a couple of plots that try to put the number of deceased cases into context.

Bar plots containing data of the most affected Countries have been added.

Due to the criticality of this information, no recommendations are included in this paper. Currently, Doctors and Authorities are the best sources for such recommendations.

If you are not interested in the code, go to section 6 and onward and focus on the plots, the tables and the plain text.

DISCLAIMER:

  • The code has not been peer-reviewed. If someone is wishing to do it, please contact the author.
  • The data related to the last day might be incomplete.
  • See also the legal disclaimer.

The spread of virus follows the rules of mathematics and statistics (Dr. Katharina Hauck, https://www.imperial.ac.uk/people/k.hauck).

1.1. References

/1/ [GitHub Repository by Johns Hokpins University](https://github.com/CSSEGISandData/COVID-19)
https://github.com/CSSEGISandData/COVID-19

/2/ [Dashboard by Johns Hokpins University with world-wide view](https://www.arcgis.com/apps/opsdashboard/index.html#/bda7594740fd40299423467b48e9ecf6)
https://www.arcgis.com/apps/opsdashboard/index.html#/bda7594740fd40299423467b48e9ecf6

/3/ Dashboard by Tableau with both global and Country-specific data
https://public.tableau.com/profile/covid.19.data.resource.hub#!/vizhome/COVID-19Cases_15840488375320/COVID-19Cases

/4/ Latest news about Finland broken by Region
https://finland-coronavirus-map.netlify.com/

/5/ European Centre for Disease Prevention and Control
https://www.ecdc.europa.eu/en/novel-coronavirus-china

/6/ Coursera: Let's Talk About COVID-19
https://www.coursera.org/learn/covid-19/home/welcome

2. Setup

In [1]:
# Importing the needed packages
import os
import datetime as dt
import regex as re
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Displaying all the dafaframe columns and rows
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

# Setting a time stamp
start_time = dt.datetime.utcnow()

3. Defining the Needed Functions

3.1. Dataframes and Lists Handling

In [2]:
def df_basic_data(dfname):
    '''
    This function prints basic information about a given dataframe.
    The function needs as input parameters the dataframe name.
    '''

    import pandas as pd

    # Fetching the dataframe name
    name = [x for x in globals() if globals()[x] is dfname][0]
    print("Dataframe name:", name, "\n")
    print("Dataframe length:", len(dfname), "\n")
    print("Number of columns:", len(dfname.columns), "\n")
    # Columns data types
    data_types = dfname.dtypes
    # Distint values
    distint_values = dfname.apply(pd.Series.nunique)
    # Amount of null values
    null_values = dfname.isnull().sum()
    print("Dataframe's columns names, column data types, amount of distint "
          "(non null) values\n"
          "and amount of null values for each column:")
    df_index = ['Data_Type',
                'Amount_of_Distint_Values',
                'Amount_of_Null_Values']
    col_types_dist_null = pd.DataFrame([data_types,
                                        distint_values,
                                        null_values],
                                       index=df_index)
    return col_types_dist_null.transpose()
In [3]:
def calc_increments(listname):
    '''
    This function:
    takes a list,
    calculates the delta between each element and its predecessor,
    returns the result in a new list having the same lenght as the original list
    '''

    # Initializing an empty list of floats to contain the increments
    increments = []
    # Adding zero to the first element
    increments.append(0.0)
    # Looping through all the occurrencies except the first one
    for i in list(range(1, len(listname))):
        # Calculating the increment
        delta = listname[i]-listname[i-1]
        # Adding the result to the list
        increments.append(delta)
    # Returning the result
    return increments
In [4]:
def find_neg_increm(listname):
    '''
    This function:
    takes a list,
    calculates the delta between each element and its predecessor,
    checks if the increment is negative and
    returns the result in a new list with boolean values having the same lenght as the original list
    '''

    # Initializing an empty list of floats to contain the increments
    neg_increments = []
    # Adding zero to the first element
    neg_increments.append(0)
    # Looping through all the occurrencies except the first one
    for i in list(range(1, len(listname))):
        # Calculating the increment
        delta = listname[i]-listname[i-1]
        # Checking if the increment is negative
        if delta < 0:
            neg_increments.append(1)
        else:
            neg_increments.append(0)
    # Returning the result
    return neg_increments

3.2. Plots

In [5]:
def cust_line_plot(*parameters,
                   figsize_w=8, figsize_h=6,
                   title=None,
                   title_fs=16, title_offset=20,
                   rem_borders=False,
                   label_fs=12, tick_fs=6, 
                   x_label=None,
                   rot=0,
                   y_label=None,
                   legend=False, leg_fs=10, legend_loc=0,
                   first_line_x=None, first_line_col=7,
                   first_line_ls=':', first_line_x_l=None,
                   second_line_x=None, second_line_col=7,
                   second_line_ls='--', second_line_x_l=None,
                   third_line_x=None, third_line_col=7,
                   third_line_ls='-.', third_line_x_l=None,
                   fourth_line_x=None, fourth_line_col=7,
                   fourth_line_ls='-', fourth_line_x_l=None,
                   fifth_line_x=None, fifth_line_col=8,
                   fifth_line_ls=':', fifth_line_x_l=None,
                   sixth_line_x=None, sixth_line_col=8,
                   sixth_line_ls='--', sixth_line_x_l=None,
                   seventh_line_x=None, seventh_line_col=8,
                   seventh_line_ls=':', seventh_line_x_l=None,
                   eighth_line_x=None, eighth_line_col=8,
                   eighth_line_ls='-', eighth_line_x_l=None):
    """
    This function plots a scatterplot for the provided data
    and customizes the way the chart looks by using the value of
    the provided parameters.

    Keyword arguments:
    parameters       -- A (mandatory) tuple of 5 elements containing:
                        a list with the x values,
                        a list with the y values,
                        a string containing the selected marker,
                        a string containing the selected line style,
                        an integer (from 0 to 9) selecting the seaborn-deep
                        color,
                        a string containing the text for the legend
    figsize_w        -- The width of the plot area
    figsize_w        -- The height of the plot area
    title            -- A string containing the title of the chart
    title_fs         -- The title font size
    title_offset     -- Distance between the title and the top of the chart
    rem_borders      -- If True the top and right borders are removed
                        (default: False)
    label_fs         -- x and y axis labels' font size
    tick_fs          -- The tick values font size
    x_label          -- Label for the x-axis (string)
    rot              -- The rotation angle of the tick values
    y_label          -- Label for the y-axis (string)
    legend           -- A boolean variable that tells if to plot a legend
    leg_fs           -- Font size for the legend
    legend_loc       -- An integer from 0 to 9 controlling the legend location
    first_line_x
    ...
    eighth_line_x    -- x coordinates of vertical lines
    first_line_col
    ...
    eighth_line_col -- an integer (from 0 to 9) selecting the seaborn-deep
                        color of the corresponding line    
    first_line_x_l
    ...
    eighth_line_x_l -- legend text for the corresponding lines
    """

    import matplotlib.pyplot as plt
    import seaborn as sns

    # Creating a new figure
    plt.figure(figsize=(figsize_w, figsize_h))
    # Defining the used style
    color_list = sns.color_palette(palette='deep')
    # Adding a title (with some distance to the top of the plot)
    plt.title(title, fontsize=title_fs, pad=title_offset)
    # Removing the top and right borders if so defined
    if rem_borders is True:
        sns.despine(top=True, right=True, left=False, bottom=False)

    # Initializing an empy list to contain the legend text
    leg_text_l = []
    for param in parameters:
        # Extracting the values given in parameters
        x = param[0]
        y = param[1]
        mark = param[2]
        ls = param[3]
        col_numb = param[4]
        leg_text = param[5]
        # Appending the string to the list
        leg_text_l.append(leg_text)
        # Creating the scatter plots
        plot = plt.plot(x, y, marker=mark, linestyle=ls, color=color_list[col_numb])

    # If a label for the x axis is provided, showing it on the x axis
    if x_label:
        plt.xlabel(x_label, fontsize=label_fs)
    plt.xticks(fontsize=tick_fs, rotation=rot)
    # If a label for the y axis is provided, showing it on the y axis
    if y_label:
        plt.ylabel(y_label, fontsize=label_fs)
    plt.yticks(fontsize=tick_fs)

    # Adding vertical lines
    if first_line_x:
        plt.axvline(x=first_line_x, color=color_list[first_line_col],
                    linestyle=first_line_ls)
        leg_text_l.append(first_line_x_l)
    if second_line_x:
        plt.axvline(x=second_line_x, color=color_list[second_line_col],
                    linestyle=second_line_ls)
        leg_text_l.append(second_line_x_l)
    if third_line_x:        
        plt.axvline(x=third_line_x, color=color_list[third_line_col],
                    linestyle=third_line_ls)
        leg_text_l.append(third_line_x_l)
    if fourth_line_x:        
        plt.axvline(x=fourth_line_x, color=color_list[fourth_line_col],
                    linestyle=fourth_line_ls)
        leg_text_l.append(fourth_line_x_l)
    if fifth_line_x:        
        plt.axvline(x=fifth_line_x, color=color_list[fifth_line_col],
                    linestyle=fifth_line_ls)
        leg_text_l.append(fifth_line_x_l)
    if sixth_line_x:        
        plt.axvline(x=sixth_line_x, color=color_list[sixth_line_col], 
                    linestyle=sixth_line_ls)
        leg_text_l.append(sixth_line_x_l)
    if seventh_line_x:        
        plt.axvline(x=seventh_line_x, color=color_list[seventh_line_col], 
                    linestyle=seventh_line_ls)
        leg_text_l.append(seventh_line_x_l)
    if eighth_line_x:        
        plt.axvline(x=eighth_line_x, color=color_list[eighth_line_col], 
                    linestyle=eighth_line_ls)
        leg_text_l.append(eighth_line_x_l) 
    
    # Adding a legend
    if legend:
        plt.legend(labels=leg_text_l, fontsize=leg_fs, loc=legend_loc,
                   facecolor="white", framealpha=1)

    # Showing the plot without additional text
    plt.show()
In [6]:
def cust_bar_plot(parameters,
                  figsize_w=8, figsize_h=6,
                  title=None, title_fs=16, title_offset=20,
                  rem_borders=False,
                  label_fs=12, tick_fs=6,
                  x_label=None,
                  rot=0,
                  y_label=None,
                  legend=False,
                  leg_fs=10,
                  legend_loc=0,
                  first_line_x=None, first_line_col=7,
                  first_line_ls=':', first_line_x_l=None,
                  second_line_x=None, second_line_col=7,
                  second_line_ls='--', second_line_x_l=None,
                  third_line_x=None, third_line_col=7,
                  third_line_ls='-.', third_line_x_l=None,
                  fourth_line_x=None, fourth_line_col=7,
                  fourth_line_ls='-', fourth_line_x_l=None,
                  fifth_line_x=None, fifth_line_col=8,
                  fifth_line_ls=':', fifth_line_x_l=None,
                  sixth_line_x=None, sixth_line_col=8,
                  sixth_line_ls='--', sixth_line_x_l=None,
                  seventh_line_x=None, seventh_line_col=8,
                  seventh_line_ls=':', seventh_line_x_l=None,
                  eighth_line_x=None, eighth_line_col=8,
                  eighth_line_ls='-', eighth_line_x_l=None,                  
                  first_line_y=None, first_line_y_l=None,
                  second_line_y=None, second_line_y_l=None,
                  third_line_y=None, third_line_y_l=None,
                  fourth_line_y=None, fourth_line_y_l=None):
    """
    This function plots a bar plot for the provided data
    and customizes the way the chart looks by using the value of
    the provided parameters.

    Keyword arguments:
    parameters       -- A (mandatory) tuple of 4 elements containing:
                        a list with the x values,
                        a list with the y values,
                        an integer (from 0 to 9) selecting the seaborn-deep
                        color,
                        a string containing the text for the legend
    figsize_w        -- The width of the plot area
    figsize_w        -- The height of the plot area
    title            -- A string containing the title of the chart
    title_fs         -- The title font size
    title_offset     -- Distance between the title and the top of the chart
    rem_borders      -- If True the top and right borders are removed
                        (default: False)
    label_fs         -- x and y axis labels' font size
    tick_fs          -- The tick values font size
    x_label          -- Label for the x-axis (string)
    rot              -- The rotation angle of the tick values
    y_label          -- Label for the y-axis (string)
    legend           -- A boolean variable that tells if to plot a legend
    leg_fs           -- Font size for the legend
    legend_loc       -- An integer from 0 to 9 controlling the legend location
    first_line_x
    second_line_x
    third_line_x     
    fourth_line_x
    fifth_line_x     
    sixth_line_x     -- x coordinates of vertical lines
    first_line_col
    second_line_col
    third_line_col     
    fourth_line_col
    fifth_line_col
    sixth_line_col   -- an integer (from 0 to 9) selecting the seaborn-deep
                        color of the corresponding line    
    first_line_x_l
    ...
    eighth_line_x_l   -- legend text for the corresponding lines
    first_line_y
    second_line_y
    third_line_y     
    fourth_line_y    -- y coordinates of horizontal lines
    first_line_y_l
    second_line_y_l
    third_line_y_l   
    fourth_line_y_l  -- legend text for the corresponding lines
    """

    import matplotlib.pyplot as plt
    import seaborn as sns

    # Creating a new figure
    plt.figure(figsize=(figsize_w, figsize_h))
    # Defining the used style
    color_list = sns.color_palette(palette='deep')
    # Adding a title (with some distance to the top of the plot)
    plt.title(title, fontsize=title_fs, pad=title_offset)
    # Removing the top and right borders if so defined
    if rem_borders is True:
        sns.despine(top=True, right=True, left=False, bottom=False)

    # Initializing an empy list to contain the legend text
    leg_text_l = []
    # Extracting the values given in parameters
    x = parameters[0]
    y = parameters[1]
    col_numb = parameters[2]
    leg_text = parameters[3]

    # Creating the bar plot
    plot = plt.bar(x, y, color=color_list[col_numb])

    # If a label for the x axis is provided, showing it on the x axis
    if x_label:
        plt.xlabel(x_label, fontsize=label_fs)
    plt.xticks(fontsize=tick_fs, rotation=rot)
    # If a label for the y axis is provided, showing it on the y axis
    if y_label:
        plt.ylabel(y_label, fontsize=label_fs)
    plt.yticks(fontsize=tick_fs)

    # Adding vertical lines
    if first_line_x:
        plt.axvline(x=first_line_x, color=color_list[first_line_col],
                    linestyle=first_line_ls)
        leg_text_l.append(first_line_x_l)
    if second_line_x:
        plt.axvline(x=second_line_x, color=color_list[second_line_col],
                    linestyle=second_line_ls)
        leg_text_l.append(second_line_x_l)
    if third_line_x:        
        plt.axvline(x=third_line_x, color=color_list[third_line_col],
                    linestyle=third_line_ls)
        leg_text_l.append(third_line_x_l)
    if fourth_line_x:        
        plt.axvline(x=fourth_line_x, color=color_list[fourth_line_col],
                    linestyle=fourth_line_ls)
        leg_text_l.append(fourth_line_x_l)
    if fifth_line_x:        
        plt.axvline(x=fifth_line_x, color=color_list[fifth_line_col],
                    linestyle=fifth_line_ls)
        leg_text_l.append(fifth_line_x_l)
    if sixth_line_x:        
        plt.axvline(x=sixth_line_x, color=color_list[sixth_line_col], 
                    linestyle=sixth_line_ls)
        leg_text_l.append(sixth_line_x_l)
    if seventh_line_x:        
        plt.axvline(x=seventh_line_x, color=color_list[seventh_line_col], 
                    linestyle=seventh_line_ls)
        leg_text_l.append(seventh_line_x_l)
    if eighth_line_x:        
        plt.axvline(x=eighth_line_x, color=color_list[eighth_line_col], 
                    linestyle=eighth_line_ls)
        leg_text_l.append(eighth_line_x_l) 
        
    # Adding horizontal lines
    if first_line_y:
        plt.axhline(y=first_line_y, color='grey', linestyle=':')
        leg_text_l.append(first_line_y_l)
    if second_line_y:
        plt.axhline(y=second_line_y, color='grey', linestyle='--')
        leg_text_l.append(second_line_y_l)
    if third_line_y:
        plt.axhline(y=third_line_y, color='grey', linestyle='-.')
        leg_text_l.append(third_line_y_l)
    if fourth_line_y:
        plt.axhline(y=fourth_line_y, color='grey', linestyle='-.')
        leg_text_l.append(fourth_line_y_l)

    # Adding a legend
    if legend:
        leg_text_l.append(leg_text)
        plt.legend(labels=leg_text_l, fontsize=leg_fs, loc=legend_loc,
                   facecolor="white", framealpha=1)

    # Showing the plot without additional text
    plt.show()
In [7]:
def plot_stacked_bar(x, data, series_labels, col,
                     multidim=True, figsize_w=8, figsize_h=6,
                     title=None, title_fs=16,
                     frame=True,
                     category_labels=None,
                     label_fs=12, ticks_fs=12,
                     x_label=None, rot=0,
                     y_label=None,
                     legend=True, legend_loc=0, legend_fs=10,
                     add_text=None, addtext_x=0, addtext_y=0, addtext_fs=10):
    """
    This function plots a stacked bar chart with the provided data and
    labels.

    Keyword arguments:
    x               -- A list containing the x values (mandatory)
    data            -- A list of lists where each internal list contains
                       data of a series (mandatory)
    series_labels   -- List of series labels (strings) (these appear in
                       the legend) (mandatory)
    col             -- A list of integers controlling the colors of the series
                       (mandatory)
    multidim        -- Defines if data is multidimensional (default is True)
    figsize_w       -- The width of the plot area
    figsize_w       -- The height of the plot area
    title           -- A string containing the title of the chart
    title_fs        -- The title font size
    frame           -- If False, the figure frame is omitted as well as
                       ticks and labels on the y axis
    category_labels -- List of category labels (strings) (these appear
                       on the x-axis)
    label_fs        -- x and y axis labels' font size
    tick_fs         -- The tick values font size
    rot             -- The rotation of the x axisis label (numerical)
                       (the default is horizontal)
    y_label         -- Label for the y-axis (string)
    legend          -- If true it shows a legend
    legend_loc      -- Used to position the legend compared to the centre
                       of the plot
    legend_fs       -- Legend font size
    add_text        -- Additional text to be shown in a box (string)
    addtext_x       -- Used to position the additional text box
    addtext_y       -- Used to position the additional text box
    addtext_fs      -- Font size of the additional text
    """

    # Finding the number of categories
    if multidim:
        cat_number = len(data[0])
    else:
        cat_number = len(data)

    # Preparing the indexes for the x axis
    ind = list(range(cat_number))
    # Initializing a list
    axes = []
    # Defining a numpy array containing the y coordinates of the bars
    # (the bars of the first series are on the x axis)
    bar_base = np.zeros(cat_number)
    # Converting the list with the data into a numpy array
    data = np.array(data)

    # Creating a new figure
    plt.figure(figsize=(figsize_w, figsize_h))
    # Defining the used style
    color_list = sns.color_palette(palette='deep')
    # Adding a title (with some distance to the top of the plot)
    plt.title(title, fontsize=title_fs, pad=20)
    # Removing the frame and y axis ticks and values if so defined
    if frame is False:
        sns.despine(top=True, right=True, left=False, bottom=False)

    # If category labes are provided, showing them on the x axis
    if category_labels:
        plt.xticks(ind, category_labels, fontsize=ticks_fs, rotation=rot)

    # If a label for the x axis is provided, showing it on the x axis
    if x_label:
        plt.xlabel(x_label, fontsize=label_fs)
    # If a label for the y axis is provided, showing it on the y axis
    if y_label:
        plt.ylabel(y_label, fontsize=label_fs)

    if multidim:
        # Iterating through the dimensions of the array
        for i, row_data in enumerate(data):
            # Creating the bars
            axes.append(plt.bar(x, row_data, bottom=bar_base,
                                color=color_list[col[i]],
                                label=series_labels[i]))
            # Incrementing the bar base height for the next series
            # by the height of the bar of the previous series
            bar_base += row_data
    else:
        # Creating the bars
        axes.append(plt.bar(x, data))

    # Creating a legend
    if legend:
        plt.legend(fontsize=legend_fs, loc=legend_loc,
                   facecolor="white", framealpha=1)

    # Adding a text box with additional information
    if add_text:
        box_style = dict(facecolor='white')
        plt.gcf().text(addtext_x, addtext_y,
                       add_text,
                       fontsize=addtext_fs, bbox=box_style)

    # Showing the plot without additional text
    plt.show()
In [8]:
def plot_cust_hbar(data,
                   figsize_w=8, figsize_h=6,
                   frame=True, grid=False,
                   ref_font_size=12,
                   title_text=None,
                   title_offset=20,
                   color_numb=0,
                   categ_labels=True,
                   labels=None,
                   rot=0,
                   show_values=False,
                   omitted_value=0,
                   percent=False,
                   center_al=True,
                   visible_digits=2):
    """
    This function plots a horizontal bar charts for the provided data with
    the provided labels and settings.

    Keyword arguments:
    data            -- A sorted Series that contains categorical data
                       (mandatory)
    figsize_w       -- The width of the plot area
    figsize_h       -- The height of the plot area
    frame           -- If False, the figure frame is omitted as well as
                       ticks and labels on the y axis (default is True)
    grid            -- If True a horizontal grid is displayed. It works
                       only when frame=True (default is False)
    ref_font_size   -- Reference font size used for all the fonts
    title_text      -- A string containing the title of the chart
    title_offset    -- The offset of the title from the rest of the plot
    color_numb      -- An integer between 0 and 9 that indicated the
                       seaborn-deep color to be used for the bars
    categ_labels    -- A boolean variable that defines if category labels
                       shall appear (on the y-axis)
    labels          -- List of category labels (strings) used only if
                       categ_labels=True.
                       They override the existing labels
    rot             -- The rotation of the x axsis label (numerical)
                       (the default is horizontal)
    show_values     -- If True, then numeric value labels will be shown on
                       each bar (default is False)
    omitted_value   -- The max value that shall not be shown in the bar
    percent         -- If true, it indicates that the values are in percentage
                       (default is False)
    center_al       -- A boolean variable that defines if the values shall be
                       written in the centre of the bar (default is True)
    visible_digits  -- Integer defining the number of decimal digits
                       to be seen in the value labels (the default is 2)
    """

    import pandas as pd
    import matplotlib.pyplot as plt
    import seaborn as sns

    # Defining the suffix to be shown in the bar values
    if percent:
        p = '%'
    else:
        p = ""

    # Preparing the indexes for the x axis
    ind = list(range(len(data)))

    # Creating a new figure
    fig = plt.figure(figsize=(figsize_w, figsize_h))
    # Defining the used style
    color_list = sns.color_palette(palette='deep')

    # Removing y axis ticks
    plt.gca().yaxis.set_ticks_position('none')

    if frame is False:
        # Removing the borders, if so defined
        sns.despine(top=True, right=True, left=True, bottom=True)
        # Removing ticks and values in the x axes
        plt.gca().axes.get_xaxis().set_visible(False)
    elif grid:
        # Showing a vertical grig, if so defined
        plt.gca().xaxis.grid(color='grey', alpha=0.25,
                             linestyle='-', linewidth=1)

    # Adding a title (with some distance to the top of the plot)
    plt.title(title_text, fontsize=ref_font_size*1.33,
              loc='center', pad=title_offset)

    # Creating the bar plot
    plot = plt.barh(ind, data, color=color_list[color_numb])

    # Showing category labels on the y axes, if so defined
    if categ_labels:
        # Overriding the index value if category labels are provided
        if labels:
            plt.yticks(ind, labels, fontsize=ref_font_size, rotation=rot)
        else:
            plt.yticks(ind, data.index.tolist(),
                       fontsize=ref_font_size, rotation=rot)
    else:
        # Removing ticks and values in the y axes
        plt.gca().axes.get_yaxis().set_visible(False)

    # Showing the bar values, if so defined
    if show_values:
        # Iterating through the bars in the plot
        for bar in plot:
            # Getting bar height and width
            w, h = bar.get_width(), bar.get_height()
            # Printing the values only if they are bigger than the defined value
            if w > omitted_value:
                if center_al is True:
                    # Positioning the text in the centre of the bar horizontally
                    # and vertically
                    plt.text(bar.get_x() + w/2, bar.get_y() + h/2,
                             "{}".format(round(w, visible_digits))+p,
                             fontsize=ref_font_size, color="white",
                             ha="center", va="center")
                else:
                    # Positioning the text at the right of the bar horizontally
                    # and in the centre vertically
                    plt.text(bar.get_x() + w, bar.get_y() + h/2,
                             "{}".format(round(w, visible_digits))+p,
                             fontsize=ref_font_size,
                             ha="left", va="center")

    # Showing the plot without additional text
    plt.show()

3.3. Project-specific Functions

In [9]:
def find_last_day():
    '''
    This function reads in a certain directory to find the latest CSV file
    and returns the date of the last file in a string in the format mm-dd-yyyy
    '''

    # Getting the list of files in the daily reports folder
    for roots, dirs, files in os.walk('JHU_COVID-19/COVID-19/'
                                      'csse_covid_19_data/'
                                      'csse_covid_19_daily_reports'):
        file_list = files  # list of strings
        # Initializing a new list
        dates = []
        # Iterating through the original list
        for i in list(range(len(file_list))):
            file = file_list[i]
            # If is it a csv file ...
            if re.search("\S+[csv]", file):
                # Extracting the date into a list of string
                date = re.findall("[0-9]+[-][0-9]+[-][0-9]+", file)
                # Converting the format from string to date
                dt_date = dt.datetime.strptime(date[0], "%m-%d-%Y")
                # Appending the date to a list of dates (the new list)
                dates.append(dt_date)
    # Sorting the dates and taking the last one
    dates.sort(reverse=True)
    latest = dates[0]  # datetime
    # Converting the latest date to a string
    last_day = latest.strftime("%m-%d-%Y")

    return last_day
In [10]:
def extract_country(Country, State=None, days=0):
    '''
    This function allows selecting data related to a specific Country
    from the datasets produced by JHU.
    It takes the following input:
    - a string containing the Country name written with the first letter
    as a capital letter (mandatory)
    - a string containing the State name written with the first letter
    as a capital letter (optional)
    - an integer containing how many days to skip (default = 0)
    It returns a tuple of 3 lists containing data related to confirmed
    recovered and deceased cases.
    '''

    # Extracting confirmed cases
    if State:
        confirm = world_conf_clean[(world_conf_clean['Country/Region'] == Country) &
                                   (world_conf_clean['Province/State'] == State)]
    else:
        confirm = world_conf_clean[(world_conf_clean['Country/Region'] == Country)]
    # Extracting the columns containing the data for each day
    # by skipping a number of days equal to days
    confirm = confirm.iloc[:, 4+days:]
    # Copying the result into a list
    confirm_l = confirm.values.tolist()[0]
    
    # Extracting recovered cases
    if State:
        recov = world_recov_clean[(world_recov_clean['Country/Region'] == Country) &
                                  (world_recov_clean['Province/State'] == State)]
    else:
        recov = world_recov_clean[(world_recov_clean['Country/Region'] == Country)]
    # Extracting the columns containing the data for each day
    # by skipping a number of days equal to days
    recov = recov.iloc[:, 4+days:]
    # Copying the result into a list
    recov_l = recov.values.tolist()[0]

    # Extracting deceased cases
    if State:
        deceas = world_deceas_clean[(world_deceas_clean['Country/Region'] ==
                                     Country) &
                                    (world_deceas_clean['Province/State'] == State)]
    else:
        deceas = world_deceas_clean[(world_deceas_clean['Country/Region'] ==
                                     Country)]
    # Extracting the columns containing the data for each day
    # by skipping a number of days equal to days
    deceas = deceas.iloc[:, 4+days:]
    # Copying the result into a list
    deceas_l = deceas.values.tolist()[0]

    return confirm_l, recov_l, deceas_l
In [11]:
def extract_non_null(input_list):
    '''
    This function takes as input a list that contains a certain number of
    zero values, omits such values and returns what is left in a new list.
    '''

    # Initializing a list
    no_null = []
    # Looping through all the elements of the list
    for i in list(range(len(input_list))):
        if input_list[i] != 0:
            # Extracting non null values
            no_null.append(input_list[i])

    return no_null
In [12]:
def pop_perc(values, pop):
    '''
    This function takes the following inputs:
    - a list of floats in units
    - a float in million of units

    The function calculates the percentage values of the values in the list
    compared to the value in the single float multiplied one million times.
    The function is useful, for example, to calculate the number of
    confirmed Coronavirus cases pro capita
    (in percentage of the total pupulation in millions).

    The function retunts a list of floats.
    '''

    result = (pd.Series(values)/(pop*1000000))*100

    return result
In [13]:
def prep_country_data(Country, pop, State=None, days=0):
    '''
    This function allows to prepare the data for a specific Country.

    It takes the following inputs:

    - a string variable that contains the name of the Country
    written with the first letter as a capital letter (mandatory)
    - a float that contains the Country population in millions (mandatory)
    - a string containing the State name written with the first letter
    as a capital letter (optional)
    - an integer that tells the number of initial days in the time series
    to skip (default = 0)

    The function uses the following functions:

    - 'extract_country' to extract Country-specific information from
    the relevant dataframes
    - 'calc_increments' to calculate the daily increments in a time series
    - 'extract_non_null' to extract only the non null values of a time series
    - 'pop_perc' to calculate the values of a series in percentage of the Country population

    The output is a tuple with the following content:

    - a list containing a time series with the cumulative confirmed cases
    - a list containing a time series with the cumulative recovered cases
    - a list containing a time series with the cumulative deceased cases
    - a list containing a time series with the cumulative active cases
    - a list containing a time series with the daily increment
    in the confirmed cases
    - a list containing a time series with the daily increment
    in the deceased cases
    - a list containing a time series with the cumulative confirmed cases
    starting from the day of the first positive case
    - a list containing a time series with the cumulative confirmed cases per capita
    - a list containing a time series with the cumulative deceased cases per capita
    '''

    # Getting the name of thew Country in small letters
    country = Country.lower()
    
    # Extracting country-speficic data by using the function extract_country
    countryname_hiddendays = extract_country(Country, State, days)
    # Extracting the time series for the cumulative confirmed cases
    countryname_conf_hiddend = countryname_hiddendays[0]
    # Extracting the time series for the cumulative recovered cases
    countryname_recov_hiddend = countryname_hiddendays[1]
    # Extracting the time series for the cumulative deceased cases
    countryname_deceas_hiddend = countryname_hiddendays[2]
    # Calculating the active cases
    countryname_act_hiddend = list(np.array(countryname_conf_hiddend) - \
                                   np.array(countryname_recov_hiddend) - \
                                   np.array(countryname_deceas_hiddend))    
    # Extracting the time series for the daily increments in the confirmed cases
    countryname_conf_incr_hiddend = calc_increments(countryname_conf_hiddend)
    # Extracting the time series for the daily increments in the deceased cases
    countryname_deceas_incr_hiddend = calc_increments(countryname_deceas_hiddend)    
    # Extracting the complete time series about the cumulative confirmed cases
    complete_conf_series = extract_country(Country, State, 0)
    # Extracting the time series for the cumulative confirmed cases
    # starting from the day of the first positive case
    countryname_conf_pos = extract_non_null(complete_conf_series[0])
    # Extracting the time series for the cumulative confirmed cases per capita
    countryname_conf_hiddend_perc = pop_perc(countryname_conf_hiddend, pop)
    # Extracting the time series for the cumulative deceased cases per capita
    countryname_deceas_hiddend_perc = pop_perc(countryname_deceas_hiddend, pop)

    return countryname_conf_hiddend, \
           countryname_recov_hiddend, \
           countryname_deceas_hiddend, \
           countryname_act_hiddend, \
           countryname_conf_incr_hiddend, \
           countryname_deceas_incr_hiddend, \
           countryname_conf_pos, \
           countryname_conf_hiddend_perc, \
           countryname_deceas_hiddend_perc        
In [14]:
def find_error_days(listname):
    '''
    This function:
    takes a list,
    finds if the list contains negative increments by using the function find_neg_increm(listname),
    compares the position of such negative increments to the position of the days in the list days_tot and
    returns the corresponding days in a new list
    '''
    
    
    # Initializing a list to contain the positions in the list containing negative increments
    posit = []
    # Initializing a list to contain the days corresponding to negative increments in the list
    result = []
    # Checking for negative increments in the input list and storing their positions
    for position, item in enumerate(find_neg_increm(listname)):
        if item == 1:
            posit.append(position)
    # Finding the corresponding day 
    for position, item in enumerate(days_tot):
        if position in posit:
            result.append(item)
    print(result)

4. Dumping and Collecting the Data

The source csv files are located in the following directoryies:

  • JHU_COVID-19/COVID-19/csse_covid_19_data/csse_covid_19_time_series
  • JHU_COVID-19/COVID-19/csse_covid_19_data/csse_covid_19_daily_reports

Those directory shall be located under the directory containing this notebook.

In [15]:
# Loading the data files into pandas dataframes
# Loading the world time series
world_confirmed = pd.read_csv('JHU_COVID-19/COVID-19/csse_covid_19_data/'
                              'csse_covid_19_time_series/'
                              'time_series_covid19_confirmed_global.csv')
world_recovered = pd.read_csv('JHU_COVID-19/COVID-19/csse_covid_19_data/'
                             'csse_covid_19_time_series/'
                             'time_series_covid19_recovered_global.csv')
world_deceased = pd.read_csv('JHU_COVID-19/COVID-19/csse_covid_19_data/'
                             'csse_covid_19_time_series/'
                             'time_series_covid19_deaths_global.csv')
In [16]:
# Uploading the latest daily report
last_day = find_last_day()  # calling the function last_day
daily_report = pd.read_csv('JHU_COVID-19/COVID-19/csse_covid_19_data/'
                           'csse_covid_19_daily_reports/' + last_day + '.csv')

File descriptions

  • time_series_covid19_confirmed_global.csv: confirmed cases for each day for each Country
  • time_series_covid19_recovered_global.csv: recovered cases for each day for each Country
  • time_series_covid19_deaths_global.csv: confirmed cases for each day for each Country
  • mm-dd-yyyy.csv: last available daily report
In [17]:
# Storing the total population for the Countries of interest (in millions)
# (source: Google)
italy_pop = 60.48
spain_pop = 46.66
germany_pop = 82.79
france_pop = 66.99
switzerland_pop = 8.57
netherlands_pop = 17.18
austria_pop = 8.822
belgium_pop = 11.4
portugal_pop = 10.29
luxembourg_pop = 0.602
poland_pop = 37.97
ireland_pop = 4.904
estonia_pop = 1.328
denmark_pop = 5.603
norway_pop = 5.368
sweden_pop = 10.12
iceland_pop = 0.364
finland_pop = 5.513
uk_pop = 66.44
us_pop = 327.2
hubei_pop = 58.5
china_pop = 1386
restchina_pop = china_pop-hubei_pop
brazil_pop = 212.559
russia_pop = 145.9
india_pop = 1380
In [18]:
# Storing the population density for the Countries of interest (people/km2)
# (source: Google)
italy_dens = 201.3
spain_dens = 91.4
germany_dens = 240
france_dens = 122.34
switzerland_dens = 219
netherlands_dens = 488
austria_dens = 109
belgium_dens = 383
portugal_dens = 111
luxembourg_dens = 242
poland_dens = 124
ireland_dens = 72
estonia_dens = 31
denmark_dens = 134
norway_dens = 15
sweden_dens = 25
iceland_dens = 3
finland_dens = 15
uk_dens = 274
us_dens = 36
hubei_dens = 310
china_dens = 145
brazil_dens = 25
russia_dens = 8.54
india_dens = 464
In [19]:
# Storing the median age for the Countries of interest
# source: https://en.wikipedia.org/wiki/List_of_countries_by_median_age
italy_median_age = 45.5
spain_median_age = 42.7
germany_median_age = 45.7
france_median_age = 41.4
switzerland_median_age = 42.4
netherlands_median_age = 42.6
austria_median_age = 44.0
belgium_median_age = 41.4
portugal_median_age = 42.2
luxembourg_median_age = 39.3
poland_median_age = 39.7
ireland_median_age = 36.5
estonia_median_age = 41.6
denmark_median_age = 42.2
norway_median_age = 39.2
sweden_median_age = 41.2
iceland_median_age = 36.5
finland_median_age = 42.5
uk_median_age = 40.5
us_median_age = 38.1
china_median_age = 37.4
brazil_median_age = 31.4
russia_median_age = 38.6
india_median_age = 26.8
In [20]:
# List of containment actions taken by the Finnish Government

# Creating a dataframe
measures = pd.DataFrame(columns=['Date', 'Actions'])

# Adding the actions
measures = measures.append(pd.Series(["12.3.",
"First containment measures: gathering of more than 500 people banned"],
index=measures.columns), ignore_index=True)

measures = measures.append(pd.Series(["16.3.",
"State of emergency declared: closing shools, universities, museums, theatres, \
libraries, sport facilities; gathering of more than 10 people banned"],
index=measures.columns), ignore_index=True)

measures = measures.append(pd.Series(["28.3.",
"Additional containment measures: Uusimaa region borders closed,  \
restaurant dining forbidden"],
index=measures.columns), ignore_index=True)

measures = measures.append(pd.Series(["11.4.",
"Additional containment measures: No passengers in ships from Germany, Sweden, Estonia"],
index=measures.columns), ignore_index=True)

measures = measures.append(pd.Series(["15.4.",
"First releasing measures: Uusima border re-opened"],
index=measures.columns), ignore_index=True)

measures = measures.append(pd.Series(["14.5.",
"More releasing misures: schools opening, business travell allowed within Schengen"],
index=measures.columns), ignore_index=True)

measures = measures.append(pd.Series(["1.6.",
"Further releasing: gathering up to 50 people allowed, reopening of bars and restaurants, \
reopening of museums and theatres"],
index=measures.columns), ignore_index=True)

measures = measures.append(pd.Series(["15.6.",
"End of state of emergency"],
index=measures.columns), ignore_index=True)

5. Data Analysis

5.1. Summary

Preliminary Data Analysis

The 3 time series files have columns for Province/State, Country/Region, latitude, longitude and data for each day. The columns related to the day are named in the format m/d/yy.

Each entry represents a different location. One Country can be associated with more than one State/Province and in this case one Country has more than one entry. This happens for US, China, Canada, France, Australia, United Kingdom, Netherlands and Denmark.

The daily report file has columns for Province/State, Country/Region, latitude, longitude and time stamp as well as cumulative confirmed, deaths and recovered cases.

Data Cleansing

NaN values have been handled by filling with the string "Not applicable".

Data Preparation

Separate datasets with no GPS coordinates and no time stamp have been created.

Separate datasets have been created to group data by Country.

A list of relevant dates for the plots has been created.

Country specific data has been extracted.

World-wide grand totals have been calculated.

A summary of the created datasets is available in section 5.5.

5.2. Preliminary Data Analysis

In [21]:
# Showing basic dataframe info
df_basic_data(world_confirmed)
Dataframe name: world_confirmed 

Dataframe length: 266 

Number of columns: 186 

Dataframe's columns names, column data types, amount of distint (non null) values
and amount of null values for each column:
Out[21]:
Data_Type Amount_of_Distint_Values Amount_of_Null_Values
Province/State object 81 185
Country/Region object 188 0
Lat float64 262 0
Long float64 263 0
1/22/20 int64 11 0
1/23/20 int64 15 0
1/24/20 int64 19 0
1/25/20 int64 28 0
1/26/20 int64 29 0
1/27/20 int64 33 0
1/28/20 int64 36 0
1/29/20 int64 37 0
1/30/20 int64 39 0
1/31/20 int64 41 0
2/1/20 int64 44 0
2/2/20 int64 43 0
2/3/20 int64 43 0
2/4/20 int64 45 0
2/5/20 int64 46 0
2/6/20 int64 46 0
2/7/20 int64 45 0
2/8/20 int64 46 0
2/9/20 int64 46 0
2/10/20 int64 45 0
2/11/20 int64 48 0
2/12/20 int64 46 0
2/13/20 int64 48 0
2/14/20 int64 46 0
2/15/20 int64 46 0
2/16/20 int64 48 0
2/17/20 int64 49 0
2/18/20 int64 49 0
2/19/20 int64 50 0
2/20/20 int64 47 0
2/21/20 int64 50 0
2/22/20 int64 52 0
2/23/20 int64 51 0
2/24/20 int64 52 0
2/25/20 int64 56 0
2/26/20 int64 53 0
2/27/20 int64 56 0
2/28/20 int64 57 0
2/29/20 int64 61 0
3/1/20 int64 63 0
3/2/20 int64 65 0
3/3/20 int64 65 0
3/4/20 int64 70 0
3/5/20 int64 70 0
3/6/20 int64 78 0
3/7/20 int64 78 0
3/8/20 int64 86 0
3/9/20 int64 84 0
3/10/20 int64 90 0
3/11/20 int64 95 0
3/12/20 int64 99 0
3/13/20 int64 107 0
3/14/20 int64 111 0
3/15/20 int64 119 0
3/16/20 int64 122 0
3/17/20 int64 132 0
3/18/20 int64 135 0
3/19/20 int64 139 0
3/20/20 int64 143 0
3/21/20 int64 147 0
3/22/20 int64 160 0
3/23/20 int64 164 0
3/24/20 int64 167 0
3/25/20 int64 170 0
3/26/20 int64 176 0
3/27/20 int64 189 0
3/28/20 int64 185 0
3/29/20 int64 195 0
3/30/20 int64 184 0
3/31/20 int64 192 0
4/1/20 int64 192 0
4/2/20 int64 195 0
4/3/20 int64 200 0
4/4/20 int64 206 0
4/5/20 int64 200 0
4/6/20 int64 202 0
4/7/20 int64 206 0
4/8/20 int64 208 0
4/9/20 int64 215 0
4/10/20 int64 215 0
4/11/20 int64 210 0
4/12/20 int64 213 0
4/13/20 int64 221 0
4/14/20 int64 217 0
4/15/20 int64 217 0
4/16/20 int64 217 0
4/17/20 int64 216 0
4/18/20 int64 221 0
4/19/20 int64 222 0
4/20/20 int64 223 0
4/21/20 int64 225 0
4/22/20 int64 231 0
4/23/20 int64 231 0
4/24/20 int64 231 0
4/25/20 int64 232 0
4/26/20 int64 227 0
4/27/20 int64 228 0
4/28/20 int64 229 0
4/29/20 int64 232 0
4/30/20 int64 227 0
5/1/20 int64 232 0
5/2/20 int64 234 0
5/3/20 int64 231 0
5/4/20 int64 233 0
5/5/20 int64 234 0
5/6/20 int64 235 0
5/7/20 int64 235 0
5/8/20 int64 231 0
5/9/20 int64 236 0
5/10/20 int64 237 0
5/11/20 int64 234 0
5/12/20 int64 235 0
5/13/20 int64 234 0
5/14/20 int64 232 0
5/15/20 int64 236 0
5/16/20 int64 235 0
5/17/20 int64 239 0
5/18/20 int64 234 0
5/19/20 int64 233 0
5/20/20 int64 236 0
5/21/20 int64 232 0
5/22/20 int64 238 0
5/23/20 int64 241 0
5/24/20 int64 242 0
5/25/20 int64 239 0
5/26/20 int64 240 0
5/27/20 int64 241 0
5/28/20 int64 244 0
5/29/20 int64 241 0
5/30/20 int64 243 0
5/31/20 int64 243 0
6/1/20 int64 242 0
6/2/20 int64 247 0
6/3/20 int64 248 0
6/4/20 int64 247 0
6/5/20 int64 247 0
6/6/20 int64 243 0
6/7/20 int64 243 0
6/8/20 int64 245 0
6/9/20 int64 245 0
6/10/20 int64 251 0
6/11/20 int64 249 0
6/12/20 int64 248 0
6/13/20 int64 244 0
6/14/20 int64 246 0
6/15/20 int64 247 0
6/16/20 int64 243 0
6/17/20 int64 244 0
6/18/20 int64 249 0
6/19/20 int64 249 0
6/20/20 int64 249 0
6/21/20 int64 252 0
6/22/20 int64 249 0
6/23/20 int64 249 0
6/24/20 int64 247 0
6/25/20 int64 249 0
6/26/20 int64 249 0
6/27/20 int64 248 0
6/28/20 int64 249 0
6/29/20 int64 242 0
6/30/20 int64 247 0
7/1/20 int64 250 0
7/2/20 int64 252 0
7/3/20 int64 252 0
7/4/20 int64 252 0
7/5/20 int64 253 0
7/6/20 int64 253 0
7/7/20 int64 255 0
7/8/20 int64 254 0
7/9/20 int64 254 0
7/10/20 int64 256 0
7/11/20 int64 259 0
7/12/20 int64 260 0
7/13/20 int64 255 0
7/14/20 int64 254 0
7/15/20 int64 254 0
7/16/20 int64 247 0
7/17/20 int64 251 0
7/18/20 int64 251 0
7/19/20 int64 249 0
7/20/20 int64 251 0
7/21/20 int64 253 0
In [22]:
# Showing basic dataframe info
df_basic_data(world_recovered)
Dataframe name: world_recovered 

Dataframe length: 253 

Number of columns: 186 

Dataframe's columns names, column data types, amount of distint (non null) values
and amount of null values for each column:
Out[22]:
Data_Type Amount_of_Distint_Values Amount_of_Null_Values
Province/State object 67 186
Country/Region object 188 0
Lat float64 252 0
Long float64 252 0
1/22/20 int64 2 0
1/23/20 int64 3 0
1/24/20 int64 4 0
1/25/20 int64 4 0
1/26/20 int64 4 0
1/27/20 int64 6 0
1/28/20 int64 7 0
1/29/20 int64 7 0
1/30/20 int64 7 0
1/31/20 int64 10 0
2/1/20 int64 12 0
2/2/20 int64 16 0
2/3/20 int64 18 0
2/4/20 int64 18 0
2/5/20 int64 20 0
2/6/20 int64 24 0
2/7/20 int64 28 0
2/8/20 int64 31 0
2/9/20 int64 29 0
2/10/20 int64 28 0
2/11/20 int64 32 0
2/12/20 int64 34 0
2/13/20 int64 35 0
2/14/20 int64 37 0
2/15/20 int64 38 0
2/16/20 int64 40 0
2/17/20 int64 39 0
2/18/20 int64 39 0
2/19/20 int64 42 0
2/20/20 int64 40 0
2/21/20 int64 43 0
2/22/20 int64 43 0
2/23/20 int64 43 0
2/24/20 int64 42 0
2/25/20 int64 45 0
2/26/20 int64 45 0
2/27/20 int64 45 0
2/28/20 int64 46 0
2/29/20 int64 50 0
3/1/20 int64 50 0
3/2/20 int64 49 0
3/3/20 int64 50 0
3/4/20 int64 49 0
3/5/20 int64 50 0
3/6/20 int64 50 0
3/7/20 int64 50 0
3/8/20 int64 51 0
3/9/20 int64 51 0
3/10/20 int64 52 0
3/11/20 int64 56 0
3/12/20 int64 58 0
3/13/20 int64 59 0
3/14/20 int64 56 0
3/15/20 int64 56 0
3/16/20 int64 55 0
3/17/20 int64 58 0
3/18/20 int64 61 0
3/19/20 int64 62 0
3/20/20 int64 66 0
3/21/20 int64 72 0
3/22/20 int64 71 0
3/23/20 int64 71 0
3/24/20 int64 78 0
3/25/20 int64 79 0
3/26/20 int64 87 0
3/27/20 int64 89 0
3/28/20 int64 97 0
3/29/20 int64 95 0
3/30/20 int64 105 0
3/31/20 int64 104 0
4/1/20 int64 113 0
4/2/20 int64 118 0
4/3/20 int64 124 0
4/4/20 int64 122 0
4/5/20 int64 125 0
4/6/20 int64 134 0
4/7/20 int64 141 0
4/8/20 int64 140 0
4/9/20 int64 142 0
4/10/20 int64 147 0
4/11/20 int64 149 0
4/12/20 int64 157 0
4/13/20 int64 155 0
4/14/20 int64 162 0
4/15/20 int64 158 0
4/16/20 int64 165 0
4/17/20 int64 173 0
4/18/20 int64 165 0
4/19/20 int64 174 0
4/20/20 int64 174 0
4/21/20 int64 176 0
4/22/20 int64 185 0
4/23/20 int64 187 0
4/24/20 int64 190 0
4/25/20 int64 190 0
4/26/20 int64 193 0
4/27/20 int64 192 0
4/28/20 int64 188 0
4/29/20 int64 195 0
4/30/20 int64 199 0
5/1/20 int64 199 0
5/2/20 int64 202 0
5/3/20 int64 201 0
5/4/20 int64 202 0
5/5/20 int64 204 0
5/6/20 int64 202 0
5/7/20 int64 199 0
5/8/20 int64 207 0
5/9/20 int64 207 0
5/10/20 int64 206 0
5/11/20 int64 208 0
5/12/20 int64 205 0
5/13/20 int64 208 0
5/14/20 int64 210 0
5/15/20 int64 210 0
5/16/20 int64 209 0
5/17/20 int64 212 0
5/18/20 int64 213 0
5/19/20 int64 213 0
5/20/20 int64 218 0
5/21/20 int64 211 0
5/22/20 int64 213 0
5/23/20 int64 212 0
5/24/20 int64 215 0
5/25/20 int64 216 0
5/26/20 int64 215 0
5/27/20 int64 217 0
5/28/20 int64 216 0
5/29/20 int64 217 0
5/30/20 int64 216 0
5/31/20 int64 218 0
6/1/20 int64 216 0
6/2/20 int64 216 0
6/3/20 int64 218 0
6/4/20 int64 214 0
6/5/20 int64 218 0
6/6/20 int64 222 0
6/7/20 int64 224 0
6/8/20 int64 225 0
6/9/20 int64 226 0
6/10/20 int64 225 0
6/11/20 int64 227 0
6/12/20 int64 228 0
6/13/20 int64 228 0
6/14/20 int64 230 0
6/15/20 int64 229 0
6/16/20 int64 228 0
6/17/20 int64 227 0
6/18/20 int64 227 0
6/19/20 int64 231 0
6/20/20 int64 230 0
6/21/20 int64 231 0
6/22/20 int64 233 0
6/23/20 int64 232 0
6/24/20 int64 233 0
6/25/20 int64 232 0
6/26/20 int64 231 0
6/27/20 int64 233 0
6/28/20 int64 231 0
6/29/20 int64 234 0
6/30/20 int64 231 0
7/1/20 int64 230 0
7/2/20 int64 228 0
7/3/20 int64 227 0
7/4/20 int64 230 0
7/5/20 int64 231 0
7/6/20 int64 227 0
7/7/20 int64 231 0
7/8/20 int64 230 0
7/9/20 int64 233 0
7/10/20 int64 236 0
7/11/20 int64 236 0
7/12/20 int64 237 0
7/13/20 int64 237 0
7/14/20 int64 231 0
7/15/20 int64 235 0
7/16/20 int64 233 0
7/17/20 int64 234 0
7/18/20 int64 232 0
7/19/20 int64 231 0
7/20/20 int64 229 0
7/21/20 int64 229 0
In [23]:
# Showing basic dataframe info
df_basic_data(world_deceased)
Dataframe name: world_deceased 

Dataframe length: 266 

Number of columns: 186 

Dataframe's columns names, column data types, amount of distint (non null) values
and amount of null values for each column:
Out[23]:
Data_Type Amount_of_Distint_Values Amount_of_Null_Values
Province/State object 81 185
Country/Region object 188 0
Lat float64 262 0
Long float64 263 0
1/22/20 int64 2 0
1/23/20 int64 3 0
1/24/20 int64 3 0
1/25/20 int64 3 0
1/26/20 int64 3 0
1/27/20 int64 3 0
1/28/20 int64 3 0
1/29/20 int64 4 0
1/30/20 int64 4 0
1/31/20 int64 4 0
2/1/20 int64 4 0
2/2/20 int64 4 0
2/3/20 int64 4 0
2/4/20 int64 4 0
2/5/20 int64 4 0
2/6/20 int64 5 0
2/7/20 int64 5 0
2/8/20 int64 6 0
2/9/20 int64 6 0
2/10/20 int64 7 0
2/11/20 int64 8 0
2/12/20 int64 7 0
2/13/20 int64 9 0
2/14/20 int64 9 0
2/15/20 int64 10 0
2/16/20 int64 10 0
2/17/20 int64 10 0
2/18/20 int64 10 0
2/19/20 int64 10 0
2/20/20 int64 10 0
2/21/20 int64 10 0
2/22/20 int64 10 0
2/23/20 int64 11 0
2/24/20 int64 12 0
2/25/20 int64 13 0
2/26/20 int64 11 0
2/27/20 int64 13 0
2/28/20 int64 13 0
2/29/20 int64 15 0
3/1/20 int64 15 0
3/2/20 int64 15 0
3/3/20 int64 15 0
3/4/20 int64 16 0
3/5/20 int64 15 0
3/6/20 int64 17 0
3/7/20 int64 17 0
3/8/20 int64 17 0
3/9/20 int64 16 0
3/10/20 int64 17 0
3/11/20 int64 19 0
3/12/20 int64 20 0
3/13/20 int64 22 0
3/14/20 int64 22 0
3/15/20 int64 24 0
3/16/20 int64 25 0
3/17/20 int64 25 0
3/18/20 int64 27 0
3/19/20 int64 28 0
3/20/20 int64 31 0
3/21/20 int64 32 0
3/22/20 int64 34 0
3/23/20 int64 38 0
3/24/20 int64 40 0
3/25/20 int64 40 0
3/26/20 int64 45 0
3/27/20 int64 48 0
3/28/20 int64 50 0
3/29/20 int64 54 0
3/30/20 int64 55 0
3/31/20 int64 60 0
4/1/20 int64 61 0
4/2/20 int64 60 0
4/3/20 int64 68 0
4/4/20 int64 67 0
4/5/20 int64 69 0
4/6/20 int64 74 0
4/7/20 int64 75 0
4/8/20 int64 78 0
4/9/20 int64 79 0
4/10/20 int64 83 0
4/11/20 int64 82 0
4/12/20 int64 82 0
4/13/20 int64 84 0
4/14/20 int64 88 0
4/15/20 int64 87 0
4/16/20 int64 86 0
4/17/20 int64 93 0
4/18/20 int64 94 0
4/19/20 int64 92 0
4/20/20 int64 93 0
4/21/20 int64 91 0
4/22/20 int64 96 0
4/23/20 int64 97 0
4/24/20 int64 99 0
4/25/20 int64 96 0
4/26/20 int64 99 0
4/27/20 int64 100 0
4/28/20 int64 100 0
4/29/20 int64 103 0
4/30/20 int64 102 0
5/1/20 int64 102 0
5/2/20 int64 100 0
5/3/20 int64 105 0
5/4/20 int64 109 0
5/5/20 int64 106 0
5/6/20 int64 109 0
5/7/20 int64 110 0
5/8/20 int64 108 0
5/9/20 int64 104 0
5/10/20 int64 110 0
5/11/20 int64 107 0
5/12/20 int64 113 0
5/13/20 int64 113 0
5/14/20 int64 112 0
5/15/20 int64 118 0
5/16/20 int64 114 0
5/17/20 int64 119 0
5/18/20 int64 117 0
5/19/20 int64 116 0
5/20/20 int64 117 0
5/21/20 int64 114 0
5/22/20 int64 122 0
5/23/20 int64 122 0
5/24/20 int64 116 0
5/25/20 int64 118 0
5/26/20 int64 121 0
5/27/20 int64 123 0
5/28/20 int64 124 0
5/29/20 int64 127 0
5/30/20 int64 125 0
5/31/20 int64 122 0
6/1/20 int64 124 0
6/2/20 int64 125 0
6/3/20 int64 125 0
6/4/20 int64 125 0
6/5/20 int64 126 0
6/6/20 int64 123 0
6/7/20 int64 125 0
6/8/20 int64 127 0
6/9/20 int64 126 0
6/10/20 int64 127 0
6/11/20 int64 128 0
6/12/20 int64 128 0
6/13/20 int64 128 0
6/14/20 int64 128 0
6/15/20 int64 130 0
6/16/20 int64 129 0
6/17/20 int64 131 0
6/18/20 int64 129 0
6/19/20 int64 134 0
6/20/20 int64 130 0
6/21/20 int64 133 0
6/22/20 int64 137 0
6/23/20 int64 137 0
6/24/20 int64 135 0
6/25/20 int64 138 0
6/26/20 int64 139 0
6/27/20 int64 135 0
6/28/20 int64 135 0
6/29/20 int64 138 0
6/30/20 int64 141 0
7/1/20 int64 140 0
7/2/20 int64 136 0
7/3/20 int64 141 0
7/4/20 int64 140 0
7/5/20 int64 141 0
7/6/20 int64 142 0
7/7/20 int64 145 0
7/8/20 int64 143 0
7/9/20 int64 141 0
7/10/20 int64 143 0
7/11/20 int64 143 0
7/12/20 int64 140 0
7/13/20 int64 143 0
7/14/20 int64 139 0
7/15/20 int64 144 0
7/16/20 int64 137 0
7/17/20 int64 140 0
7/18/20 int64 148 0
7/19/20 int64 143 0
7/20/20 int64 143 0
7/21/20 int64 149 0
In [24]:
# Showing basic dataframe info
df_basic_data(daily_report)
Dataframe name: daily_report 

Dataframe length: 3924 

Number of columns: 14 

Dataframe's columns names, column data types, amount of distint (non null) values
and amount of null values for each column:
Out[24]:
Data_Type Amount_of_Distint_Values Amount_of_Null_Values
FIPS float64 3229 695
Admin2 object 1903 690
Province_State object 561 168
Country_Region object 188 0
Last_Update object 10 0
Lat float64 3844 78
Long_ float64 3834 78
Confirmed int64 1632 0
Deaths int64 475 0
Recovered int64 564 0
Active float64 1488 2
Combined_Key object 3924 0
Incidence_Rate float64 3827 78
Case-Fatality_Ratio float64 2076 51
In [25]:
# Checking how data looks like
print("world_confirmed")
world_confirmed.head()
world_confirmed
Out[25]:
Province/State Country/Region Lat Long 1/22/20 1/23/20 1/24/20 1/25/20 1/26/20 1/27/20 1/28/20 1/29/20 1/30/20 1/31/20 2/1/20 2/2/20 2/3/20 2/4/20 2/5/20 2/6/20 2/7/20 2/8/20 2/9/20 2/10/20 2/11/20 2/12/20 2/13/20 2/14/20 2/15/20 2/16/20 2/17/20 2/18/20 2/19/20 2/20/20 2/21/20 2/22/20 2/23/20 2/24/20 2/25/20 2/26/20 2/27/20 2/28/20 2/29/20 3/1/20 3/2/20 3/3/20 3/4/20 3/5/20 3/6/20 3/7/20 3/8/20 3/9/20 3/10/20 3/11/20 3/12/20 3/13/20 3/14/20 3/15/20 3/16/20 3/17/20 3/18/20 3/19/20 3/20/20 3/21/20 3/22/20 3/23/20 3/24/20 3/25/20 3/26/20 3/27/20 3/28/20 3/29/20 3/30/20 3/31/20 4/1/20 4/2/20 4/3/20 4/4/20 4/5/20 4/6/20 4/7/20 4/8/20 4/9/20 4/10/20 4/11/20 4/12/20 4/13/20 4/14/20 4/15/20 4/16/20 4/17/20 4/18/20 4/19/20 4/20/20 4/21/20 4/22/20 4/23/20 4/24/20 4/25/20 4/26/20 4/27/20 4/28/20 4/29/20 4/30/20 5/1/20 5/2/20 5/3/20 5/4/20 5/5/20 5/6/20 5/7/20 5/8/20 5/9/20 5/10/20 5/11/20 5/12/20 5/13/20 5/14/20 5/15/20 5/16/20 5/17/20 5/18/20 5/19/20 5/20/20 5/21/20 5/22/20 5/23/20 5/24/20 5/25/20 5/26/20 5/27/20 5/28/20 5/29/20 5/30/20 5/31/20 6/1/20 6/2/20 6/3/20 6/4/20 6/5/20 6/6/20 6/7/20 6/8/20 6/9/20 6/10/20 6/11/20 6/12/20 6/13/20 6/14/20 6/15/20 6/16/20 6/17/20 6/18/20 6/19/20 6/20/20 6/21/20 6/22/20 6/23/20 6/24/20 6/25/20 6/26/20 6/27/20 6/28/20 6/29/20 6/30/20 7/1/20 7/2/20 7/3/20 7/4/20 7/5/20 7/6/20 7/7/20 7/8/20 7/9/20 7/10/20 7/11/20 7/12/20 7/13/20 7/14/20 7/15/20 7/16/20 7/17/20 7/18/20 7/19/20 7/20/20 7/21/20
0 NaN Afghanistan 33.93911 67.709953 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 4 4 5 7 7 7 11 16 21 22 22 22 24 24 40 40 74 84 94 110 110 120 170 174 237 273 281 299 349 367 423 444 484 521 555 607 665 714 784 840 906 933 996 1026 1092 1176 1279 1351 1463 1531 1703 1828 1939 2171 2335 2469 2704 2894 3224 3392 3563 3778 4033 4402 4687 4963 5226 5639 6053 6402 6664 7072 7653 8145 8676 9216 9998 10582 11173 11831 12456 13036 13659 14525 15205 15750 16509 17267 18054 18969 19551 20342 20917 21459 22142 22890 23546 24102 24766 25527 26310 26874 27532 27878 28424 28833 29157 29481 29640 30175 30451 30616 30967 31238 31517 31836 32022 32324 32672 32951 33190 33384 33594 33908 34194 34366 34451 34455 34740 34994 35070 35229 35301 35475 35526 35615
1 NaN Albania 41.15330 20.168300 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 10 12 23 33 38 42 51 55 59 64 70 76 89 104 123 146 174 186 197 212 223 243 259 277 304 333 361 377 383 400 409 416 433 446 467 475 494 518 539 548 562 584 609 634 663 678 712 726 736 750 766 773 782 789 795 803 820 832 842 850 856 868 872 876 880 898 916 933 946 948 949 964 969 981 989 998 1004 1029 1050 1076 1099 1122 1137 1143 1164 1184 1197 1212 1232 1246 1263 1299 1341 1385 1416 1464 1521 1590 1672 1722 1788 1838 1891 1962 1995 2047 2114 2192 2269 2330 2402 2466 2535 2580 2662 2752 2819 2893 2964 3038 3106 3188 3278 3371 3454 3571 3667 3752 3851 3906 4008 4090 4171 4290
2 NaN Algeria 28.03390 1.659600 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 3 5 12 12 17 17 19 20 20 20 24 26 37 48 54 60 74 87 90 139 201 230 264 302 367 409 454 511 584 716 847 986 1171 1251 1320 1423 1468 1572 1666 1761 1825 1914 1983 2070 2160 2268 2418 2534 2629 2718 2811 2910 3007 3127 3256 3382 3517 3649 3848 4006 4154 4295 4474 4648 4838 4997 5182 5369 5558 5723 5891 6067 6253 6442 6629 6821 7019 7201 7377 7542 7728 7918 8113 8306 8503 8697 8857 8997 9134 9267 9394 9513 9626 9733 9831 9935 10050 10154 10265 10382 10484 10589 10698 10810 10919 11031 11147 11268 11385 11504 11631 11771 11920 12076 12248 12445 12685 12968 13273 13571 13907 14272 14657 15070 15500 15941 16404 16879 17348 17808 18242 18712 19195 19689 20216 20770 21355 21948 22549 23084 23691 24278
3 NaN Andorra 42.50630 1.521800 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 39 39 53 75 88 113 133 164 188 224 267 308 334 370 376 390 428 439 466 501 525 545 564 583 601 601 638 646 659 673 673 696 704 713 717 717 723 723 731 738 738 743 743 743 745 745 747 748 750 751 751 752 752 754 755 755 758 760 761 761 761 761 761 761 762 762 762 762 762 763 763 763 763 764 764 764 765 844 851 852 852 852 852 852 852 852 852 853 853 853 853 854 854 855 855 855 855 855 855 855 855 855 855 855 855 855 855 855 855 855 855 855 855 855 855 855 855 855 858 861 862 877 880 880 880 884 884
4 NaN Angola -11.20270 17.873900 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 2 2 3 3 3 4 4 5 7 7 7 8 8 8 10 14 16 17 19 19 19 19 19 19 19 19 19 19 24 24 24 24 25 25 25 25 26 27 27 27 27 30 35 35 35 36 36 36 43 43 45 45 45 45 48 48 48 48 50 52 52 58 60 61 69 70 70 71 74 81 84 86 86 86 86 86 86 88 91 92 96 113 118 130 138 140 142 148 155 166 172 176 183 186 189 197 212 212 259 267 276 284 291 315 328 346 346 346 386 386 396 458 462 506 525 541 576 607 638 687 705 749 779
In [26]:
# Checking how data looks like
print("world_recovered")
world_recovered.head()
world_recovered
Out[26]:
Province/State Country/Region Lat Long 1/22/20 1/23/20 1/24/20 1/25/20 1/26/20 1/27/20 1/28/20 1/29/20 1/30/20 1/31/20 2/1/20 2/2/20 2/3/20 2/4/20 2/5/20 2/6/20 2/7/20 2/8/20 2/9/20 2/10/20 2/11/20 2/12/20 2/13/20 2/14/20 2/15/20 2/16/20 2/17/20 2/18/20 2/19/20 2/20/20 2/21/20 2/22/20 2/23/20 2/24/20 2/25/20 2/26/20 2/27/20 2/28/20 2/29/20 3/1/20 3/2/20 3/3/20 3/4/20 3/5/20 3/6/20 3/7/20 3/8/20 3/9/20 3/10/20 3/11/20 3/12/20 3/13/20 3/14/20 3/15/20 3/16/20 3/17/20 3/18/20 3/19/20 3/20/20 3/21/20 3/22/20 3/23/20 3/24/20 3/25/20 3/26/20 3/27/20 3/28/20 3/29/20 3/30/20 3/31/20 4/1/20 4/2/20 4/3/20 4/4/20 4/5/20 4/6/20 4/7/20 4/8/20 4/9/20 4/10/20 4/11/20 4/12/20 4/13/20 4/14/20 4/15/20 4/16/20 4/17/20 4/18/20 4/19/20 4/20/20 4/21/20 4/22/20 4/23/20 4/24/20 4/25/20 4/26/20 4/27/20 4/28/20 4/29/20 4/30/20 5/1/20 5/2/20 5/3/20 5/4/20 5/5/20 5/6/20 5/7/20 5/8/20 5/9/20 5/10/20 5/11/20 5/12/20 5/13/20 5/14/20 5/15/20 5/16/20 5/17/20 5/18/20 5/19/20 5/20/20 5/21/20 5/22/20 5/23/20 5/24/20 5/25/20 5/26/20 5/27/20 5/28/20 5/29/20 5/30/20 5/31/20 6/1/20 6/2/20 6/3/20 6/4/20 6/5/20 6/6/20 6/7/20 6/8/20 6/9/20 6/10/20 6/11/20 6/12/20 6/13/20 6/14/20 6/15/20 6/16/20 6/17/20 6/18/20 6/19/20 6/20/20 6/21/20 6/22/20 6/23/20 6/24/20 6/25/20 6/26/20 6/27/20 6/28/20 6/29/20 6/30/20 7/1/20 7/2/20 7/3/20 7/4/20 7/5/20 7/6/20 7/7/20 7/8/20 7/9/20 7/10/20 7/11/20 7/12/20 7/13/20 7/14/20 7/15/20 7/16/20 7/17/20 7/18/20 7/19/20 7/20/20 7/21/20
0 NaN Afghanistan 33.93911 67.709953 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 5 5 10 10 10 15 18 18 29 32 32 32 32 32 40 43 54 99 112 131 135 150 166 179 188 188 207 220 228 252 260 310 331 345 397 421 458 468 472 502 558 558 610 648 691 745 745 778 801 850 930 938 996 1040 1075 1097 1128 1138 1209 1259 1303 1328 1428 1450 1522 1585 1762 1830 1875 2171 2651 3013 3326 3928 4201 4725 5164 5508 6158 7660 7962 8292 8764 8841 9260 9869 10174 10306 10674 12604 13934 14131 15651 16041 17331 19164 19366 20103 20179 20700 20847 20882 21135 21216 21254 21454 22456 22824 23151 23273 23634 23741 23741
1 NaN Albania 41.15330 20.168300 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 2 2 10 17 17 31 31 33 44 52 67 76 89 99 104 116 131 154 165 182 197 217 232 248 251 277 283 302 314 327 345 356 385 394 403 410 422 431 455 470 488 519 531 543 570 595 605 620 627 650 654 682 688 694 705 714 715 727 742 758 771 777 783 789 795 803 812 823 851 857 872 877 891 898 898 910 925 938 945 960 980 1001 1034 1039 1044 1055 1064 1077 1086 1114 1126 1134 1159 1195 1217 1250 1298 1346 1384 1438 1459 1516 1559 1592 1637 1657 1702 1744 1791 1832 1875 1881 1946 2014 2062 2091 2137 2214 2264 2311 2352 2397
2 NaN Algeria 28.03390 1.659600 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 8 8 12 12 12 12 12 32 32 32 65 65 24 65 29 29 31 31 37 46 61 61 62 90 90 90 113 237 347 405 460 591 601 691 708 783 846 894 1047 1099 1152 1204 1355 1408 1479 1508 1558 1651 1702 1779 1821 1872 1936 1998 2067 2197 2323 2467 2546 2678 2841 2998 3058 3158 3271 3409 3507 3625 3746 3968 4062 4256 4426 4784 4747 4918 5129 5277 5422 5549 5748 5894 6067 6218 6297 6453 6631 6717 6799 6951 7074 7255 7322 7420 7606 7735 7842 7943 8078 8196 8324 8422 8559 8674 8792 8920 9066 9202 9371 9674 9897 10040 10342 10832 11181 11492 11884 12094 12329 12637 13124 13124 13743 14019 14295 14792 15107 15430 15744 16051 16400 16646
3 NaN Andorra 42.50630 1.521800 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 10 10 10 10 16 21 26 31 39 52 58 71 71 128 128 128 169 169 191 205 235 248 282 309 333 344 344 344 385 398 423 468 468 472 493 499 514 521 526 537 545 550 550 568 576 596 604 615 617 624 628 639 639 652 653 653 663 676 676 681 684 692 694 698 733 735 738 741 741 744 751 757 759 780 781 781 781 789 789 791 792 792 792 792 796 797 797 797 799 799 799 799 799 799 800 800 800 800 800 800 802 802 803 803 803 803 803 803 803 803 803 803 803 803
4 NaN Angola -11.20270 17.873900 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 2 2 2 2 2 2 2 4 4 4 5 5 5 5 6 6 6 6 6 6 6 6 6 6 6 7 7 11 11 11 11 11 11 11 11 13 13 13 13 14 14 17 17 17 17 17 17 17 17 18 18 18 18 18 18 18 18 18 18 18 18 18 21 24 24 38 38 40 41 42 61 61 64 64 64 64 66 66 77 77 77 77 81 81 81 81 93 93 97 97 107 108 108 108 117 117 117 117 118 118 118 118 124 124 199 210 221 221 221
In [27]:
# Checking how data looks like
print("world_deceased")
world_deceased.head()
world_deceased
Out[27]:
Province/State Country/Region Lat Long 1/22/20 1/23/20 1/24/20 1/25/20 1/26/20 1/27/20 1/28/20 1/29/20 1/30/20 1/31/20 2/1/20 2/2/20 2/3/20 2/4/20 2/5/20 2/6/20 2/7/20 2/8/20 2/9/20 2/10/20 2/11/20 2/12/20 2/13/20 2/14/20 2/15/20 2/16/20 2/17/20 2/18/20 2/19/20 2/20/20 2/21/20 2/22/20 2/23/20 2/24/20 2/25/20 2/26/20 2/27/20 2/28/20 2/29/20 3/1/20 3/2/20 3/3/20 3/4/20 3/5/20 3/6/20 3/7/20 3/8/20 3/9/20 3/10/20 3/11/20 3/12/20 3/13/20 3/14/20 3/15/20 3/16/20 3/17/20 3/18/20 3/19/20 3/20/20 3/21/20 3/22/20 3/23/20 3/24/20 3/25/20 3/26/20 3/27/20 3/28/20 3/29/20 3/30/20 3/31/20 4/1/20 4/2/20 4/3/20 4/4/20 4/5/20 4/6/20 4/7/20 4/8/20 4/9/20 4/10/20 4/11/20 4/12/20 4/13/20 4/14/20 4/15/20 4/16/20 4/17/20 4/18/20 4/19/20 4/20/20 4/21/20 4/22/20 4/23/20 4/24/20 4/25/20 4/26/20 4/27/20 4/28/20 4/29/20 4/30/20 5/1/20 5/2/20 5/3/20 5/4/20 5/5/20 5/6/20 5/7/20 5/8/20 5/9/20 5/10/20 5/11/20 5/12/20 5/13/20 5/14/20 5/15/20 5/16/20 5/17/20 5/18/20 5/19/20 5/20/20 5/21/20 5/22/20 5/23/20 5/24/20 5/25/20 5/26/20 5/27/20 5/28/20 5/29/20 5/30/20 5/31/20 6/1/20 6/2/20 6/3/20 6/4/20 6/5/20 6/6/20 6/7/20 6/8/20 6/9/20 6/10/20 6/11/20 6/12/20 6/13/20 6/14/20 6/15/20 6/16/20 6/17/20 6/18/20 6/19/20 6/20/20 6/21/20 6/22/20 6/23/20 6/24/20 6/25/20 6/26/20 6/27/20 6/28/20 6/29/20 6/30/20 7/1/20 7/2/20 7/3/20 7/4/20 7/5/20 7/6/20 7/7/20 7/8/20 7/9/20 7/10/20 7/11/20 7/12/20 7/13/20 7/14/20 7/15/20 7/16/20 7/17/20 7/18/20 7/19/20 7/20/20 7/21/20
0 NaN Afghanistan 33.93911 67.709953 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 2 4 4 4 4 4 4 4 6 6 7 7 11 14 14 15 15 18 18 21 23 25 30 30 30 33 36 36 40 42 43 47 50 57 58 60 64 68 72 85 90 95 104 106 109 115 120 122 127 132 136 153 168 169 173 178 187 193 205 216 218 219 220 227 235 246 249 257 265 270 294 300 309 327 357 369 384 405 426 446 451 471 478 491 504 546 548 569 581 598 618 639 675 683 703 721 733 746 774 807 819 826 864 898 920 936 957 971 994 1010 1012 1048 1094 1113 1147 1164 1181 1185 1186
1 NaN Albania 41.15330 20.168300 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 2 2 2 2 2 4 5 5 6 8 10 10 11 15 15 16 17 20 20 21 22 22 23 23 23 23 23 24 25 26 26 26 26 26 26 27 27 27 27 28 28 30 30 31 31 31 31 31 31 31 31 31 31 31 31 31 31 31 31 31 31 31 31 31 31 31 31 32 32 33 33 33 33 33 33 33 33 33 33 33 34 34 34 34 34 35 36 36 36 36 37 38 39 42 43 44 44 45 47 49 51 53 55 58 62 65 69 72 74 76 79 81 83 83 85 89 93 95 97 101 104 107 111 112 113 117
2 NaN Algeria 28.03390 1.659600 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 2 3 4 4 4 7 9 11 15 17 17 19 21 25 26 29 31 35 44 58 86 105 130 152 173 193 205 235 256 275 293 313 326 336 348 364 367 375 384 392 402 407 415 419 425 432 437 444 450 453 459 463 465 470 476 483 488 494 502 507 515 522 529 536 542 548 555 561 568 575 582 592 600 609 617 623 630 638 646 653 661 667 673 681 690 698 707 715 724 732 741 751 760 767 777 788 799 811 825 837 845 852 861 869 878 885 892 897 905 912 920 928 937 946 952 959 968 978 988 996 1004 1011 1018 1028 1040 1052 1057 1068 1078 1087 1100
3 NaN Andorra 42.50630 1.521800 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 3 3 3 6 8 12 14 15 16 17 18 21 22 23 25 26 26 29 29 31 33 33 35 35 36 37 37 37 37 40 40 40 40 41 42 42 43 44 45 45 46 46 47 47 48 48 48 48 49 49 49 51 51 51 51 51 51 51 51 51 51 51 51 51 51 51 51 51 51 51 51 51 51 51 51 51 51 51 51 51 51 51 52 52 52 52 52 52 52 52 52 52 52 52 52 52 52 52 52 52 52 52 52 52 52 52 52 52 52 52 52 52 52 52 52 52 52 52
4 NaN Angola -11.20270 17.873900 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 5 5 6 6 6 6 7 8 8 9 9 10 10 10 10 10 10 11 11 13 15 17 18 19 19 19 21 21 22 23 23 26 26 26 27 28 29 29 29 29 30
In [28]:
# Checking how data looks like
print("daily_report")
daily_report.head()
daily_report
Out[28]:
FIPS Admin2 Province_State Country_Region Last_Update Lat Long_ Confirmed Deaths Recovered Active Combined_Key Incidence_Rate Case-Fatality_Ratio
0 45001.0 Abbeville South Carolina US 2020-07-22 04:34:42 34.223334 -82.461707 246 2 0 244.0 Abbeville, South Carolina, US 1002.976312 0.813008
1 22001.0 Acadia Louisiana US 2020-07-22 04:34:42 30.295065 -92.414197 1920 53 0 1867.0 Acadia, Louisiana, US 3094.528165 2.760417
2 51001.0 Accomack Virginia US 2020-07-22 04:34:42 37.767072 -75.632346 1057 15 0 1042.0 Accomack, Virginia, US 3270.825597 1.419111
3 16001.0 Ada Idaho US 2020-07-22 04:34:42 43.452658 -116.241552 6195 35 0 6160.0 Ada, Idaho, US 1286.371933 0.564972
4 19001.0 Adair Iowa US 2020-07-22 04:34:42 41.330756 -94.471059 20 0 0 20.0 Adair, Iowa, US 279.642058 0.000000
In [29]:
# Checking the Countries that are associated to more than one entry in the time series
print(world_confirmed['Country/Region'].value_counts()[world_confirmed['Country/Region']\
                                                    .value_counts() > 1].to_string())
China             33
Canada            14
France            11
United Kingdom    11
Australia          8
Netherlands        5
Denmark            3
In [30]:
# Checking the Countries that are associated to more than one entry in the daily report
print("Countries that are associated to more than one entry and number of entries\n")
print(daily_report['Country_Region'].value_counts()[daily_report['Country_Region']\
                                                    .value_counts() > 1].to_string())
Countries that are associated to more than one entry and number of entries

US                3240
Russia              83
Japan               49
India               36
Colombia            34
China               33
Mexico              32
Ukraine             27
Brazil              27
Peru                26
Sweden              21
Italy               21
Spain               20
Chile               17
Germany             17
Netherlands         17
United Kingdom      15
Canada              14
France              11
Australia            8
Pakistan             7
Denmark              3
In [31]:
# Checking the logic behind the classification in the time series
world_confirmed[world_confirmed['Country/Region'] == "Denmark"]
Out[31]:
Province/State Country/Region Lat Long 1/22/20 1/23/20 1/24/20 1/25/20 1/26/20 1/27/20 1/28/20 1/29/20 1/30/20 1/31/20 2/1/20 2/2/20 2/3/20 2/4/20 2/5/20 2/6/20 2/7/20 2/8/20 2/9/20 2/10/20 2/11/20 2/12/20 2/13/20 2/14/20 2/15/20 2/16/20 2/17/20 2/18/20 2/19/20 2/20/20 2/21/20 2/22/20 2/23/20 2/24/20 2/25/20 2/26/20 2/27/20 2/28/20 2/29/20 3/1/20 3/2/20 3/3/20 3/4/20 3/5/20 3/6/20 3/7/20 3/8/20 3/9/20 3/10/20 3/11/20 3/12/20 3/13/20 3/14/20 3/15/20 3/16/20 3/17/20 3/18/20 3/19/20 3/20/20 3/21/20 3/22/20 3/23/20 3/24/20 3/25/20 3/26/20 3/27/20 3/28/20 3/29/20 3/30/20 3/31/20 4/1/20 4/2/20 4/3/20 4/4/20 4/5/20 4/6/20 4/7/20 4/8/20 4/9/20 4/10/20 4/11/20 4/12/20 4/13/20 4/14/20 4/15/20 4/16/20 4/17/20 4/18/20 4/19/20 4/20/20 4/21/20 4/22/20 4/23/20 4/24/20 4/25/20 4/26/20 4/27/20 4/28/20 4/29/20 4/30/20 5/1/20 5/2/20 5/3/20 5/4/20 5/5/20 5/6/20 5/7/20 5/8/20 5/9/20 5/10/20 5/11/20 5/12/20 5/13/20 5/14/20 5/15/20 5/16/20 5/17/20 5/18/20 5/19/20 5/20/20 5/21/20 5/22/20 5/23/20 5/24/20 5/25/20 5/26/20 5/27/20 5/28/20 5/29/20 5/30/20 5/31/20 6/1/20 6/2/20 6/3/20 6/4/20 6/5/20 6/6/20 6/7/20 6/8/20 6/9/20 6/10/20 6/11/20 6/12/20 6/13/20 6/14/20 6/15/20 6/16/20 6/17/20 6/18/20 6/19/20 6/20/20 6/21/20 6/22/20 6/23/20 6/24/20 6/25/20 6/26/20 6/27/20 6/28/20 6/29/20 6/30/20 7/1/20 7/2/20 7/3/20 7/4/20 7/5/20 7/6/20 7/7/20 7/8/20 7/9/20 7/10/20 7/11/20 7/12/20 7/13/20 7/14/20 7/15/20 7/16/20 7/17/20 7/18/20 7/19/20 7/20/20 7/21/20
92 Faroe Islands Denmark 61.8926 -6.9118 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 2 2 2 2 2 3 9 11 18 47 58 72 80 92 115 118 122 132 140 144 155 159 168 169 173 177 179 181 181 183 184 184 184 184 184 184 184 184 184 184 184 184 185 185 185 185 187 187 187 187 187 187 187 187 187 187 187 187 187 187 187 187 187 187 187 187 187 187 187 187 187 187 187 187 187 187 187 187 187 187 187 187 187 187 187 187 187 187 187 187 187 187 187 187 187 187 187 187 187 187 187 187 187 187 187 187 187 187 187 187 187 187 187 187 187 187 187 187 187 188 188 188 188 188 188 188 188 188 188 188 188 188 188 191 191 191
93 Greenland Denmark 71.7069 -42.6043 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 2 2 2 4 4 5 6 6 10 10 10 10 10 10 10 10 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 12 12 12 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13
94 NaN Denmark 56.2639 9.5018 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 3 4 4 6 10 10 23 23 35 90 262 442 615 801 827 864 914 977 1057 1151 1255 1326 1395 1450 1591 1724 1877 2046 2201 2395 2577 2860 3107 3386 3757 4077 4369 4681 5071 5402 5635 5819 5996 6174 6318 6511 6681 6879 7073 7242 7384 7515 7695 7912 8073 8210 8445 8575 8698 8851 9008 9158 9311 9407 9523 9670 9821 9938 10083 10218 10319 10429 10513 10591 10667 10713 10791 10858 10927 10968 11044 11117 11182 11230 11289 11360 11387 11428 11480 11512 11593 11633 11669 11699 11734 11771 11811 11875 11924 11948 11962 12001 12016 12035 12099 12139 12193 12217 12250 12294 12344 12391 12391 12391 12527 12561 12615 12636 12675 12675 12675 12751 12768 12794 12815 12832 12832 12832 12878 12888 12900 12916 12946 12946 12946 13037 13061 13092 13124 13173 13173 13173 13262 13302
In [32]:
# Checking the logic behind the classification in the daily report
daily_report[daily_report['Country_Region'] == "Denmark"]
Out[32]:
FIPS Admin2 Province_State Country_Region Last_Update Lat Long_ Confirmed Deaths Recovered Active Combined_Key Incidence_Rate Case-Fatality_Ratio
3358 NaN NaN Faroe Islands Denmark 2020-07-22 04:34:42 61.8926 -6.9118 191 0 188 3.0 Faroe Islands, Denmark 390.872813 0.000000
3379 NaN NaN Greenland Denmark 2020-07-22 04:34:42 71.7069 -42.6043 13 0 13 0.0 Greenland, Denmark 22.898612 0.000000
3790 NaN NaN NaN Denmark 2020-07-22 04:34:42 56.2639 9.5018 13302 611 12261 430.0 Denmark 229.653553 4.593294
In [33]:
# Checking the logic behind the classification in the daily report
daily_report[daily_report['Country_Region'] == "Italy"]
Out[33]:
FIPS Admin2 Province_State Country_Region Last_Update Lat Long_ Confirmed Deaths Recovered Active Combined_Key Incidence_Rate Case-Fatality_Ratio
3230 NaN NaN Abruzzo Italy 2020-07-22 04:34:42 42.351222 13.398438 3344 470 2758 116.0 Abruzzo, Italy 254.959667 14.055024
3279 NaN NaN Basilicata Italy 2020-07-22 04:34:42 40.639471 15.805148 408 28 373 7.0 Basilicata, Italy 72.485783 6.862745
3299 NaN NaN Calabria Italy 2020-07-22 04:34:42 38.905976 16.594402 1239 97 1071 71.0 Calabria, Italy 63.632082 7.828894
3302 NaN NaN Campania Italy 2020-07-22 04:34:42 40.839566 14.250850 4839 434 4108 297.0 Campania, Italy 83.406703 8.968795
3353 NaN NaN Emilia-Romagna Italy 2020-07-22 04:34:42 44.494367 11.341721 29238 4279 23662 1297.0 Emilia-Romagna, Italy 655.637421 14.635064
3363 NaN NaN Friuli Venezia Giulia Italy 2020-07-22 04:34:42 45.649435 13.768136 3358 345 2914 99.0 Friuli Venezia Giulia, Italy 276.328566 10.273973
3472 NaN NaN Lazio Italy 2020-07-22 04:34:42 41.892770 12.483667 8456 853 6722 881.0 Lazio, Italy 143.831979 10.087512
3475 NaN NaN Liguria Italy 2020-07-22 04:34:42 44.411493 8.932699 10117 1565 8362 190.0 Liguria, Italy 652.440283 15.469013
3479 NaN NaN Lombardia Italy 2020-07-22 04:34:42 45.466794 9.190347 95582 16797 71775 7010.0 Lombardia, Italy 950.065076 17.573392
3496 NaN NaN Marche Italy 2020-07-22 04:34:42 43.616760 13.518875 6813 987 5682 144.0 Marche, Italy 446.674722 14.487010
3515 NaN NaN Molise Italy 2020-07-22 04:34:42 41.557748 14.659161 446 23 415 8.0 Molise, Italy 145.934290 5.156951
3571 NaN NaN P.A. Bolzano Italy 2020-07-22 04:34:42 46.499335 11.356624 2685 292 2294 99.0 P.A. Bolzano, Italy 504.397747 10.875233
3572 NaN NaN P.A. Trento Italy 2020-07-22 04:34:42 46.068935 11.121231 4891 405 4464 22.0 P.A. Trento, Italy 903.368562 8.280515
3582 NaN NaN Piemonte Italy 2020-07-22 04:34:42 45.073274 7.680687 31545 4123 26609 813.0 Piemonte, Italy 724.106064 13.070217
3591 NaN NaN Puglia Italy 2020-07-22 04:34:42 41.125596 16.867367 4556 548 3948 60.0 Puglia, Italy 113.078681 12.028095
3633 NaN NaN Sardegna Italy 2020-07-22 04:34:42 39.215312 9.110616 1379 134 1233 12.0 Sardegna, Italy 84.106341 9.717186
3647 NaN NaN Sicilia Italy 2020-07-22 04:34:42 38.115697 13.362357 3146 283 2706 157.0 Sicilia, Italy 62.921372 8.995550
3683 NaN NaN Toscana Italy 2020-07-22 04:34:42 43.769231 11.255889 10384 1131 8927 326.0 Toscana, Italy 278.418218 10.891757
3696 NaN NaN Umbria Italy 2020-07-22 04:34:42 43.106758 12.388247 1459 80 1365 14.0 Umbria, Italy 165.416688 5.483208
3708 NaN NaN Valle d'Aosta Italy 2020-07-22 04:34:42 45.737503 7.320149 1196 146 1049 1.0 Valle d'Aosta, Italy 951.729187 12.207358
3717 NaN NaN Veneto Italy 2020-07-22 04:34:42 45.434905 12.338452 19671 2053 16994 624.0 Veneto, Italy 400.969943 10.436683

Multiple entries in the daily reports are due both to the existence of offshore territories (as for Denmark) and to the breaking the data of certain Countries into different areas (as for Italy).

In the time series the following applies.

For France, Netherlands and Denmark, in order to get the data related to the main land it is enough to search for Country_Region = countryname and Province_State = NaN.

The same can be done for the United Kingdom. This excludes the Isle of Man and Channel Islands.

For Australia, it is enough to sum up all the entries where Country_Region = countryname. This includes Tasmania.

The same can be done for China and this will include also Hainan and Hong Kong.

For Canada, summing all the entries include also the people from Diamond Princess and Grand Princes ships, we well as Prince Edward Island population.

In [34]:
print("Population of different Countries in million (source: Google):\n\n",
     "Italy:", italy_pop, "\n",
     "Spain:", spain_pop, "\n",
     "Germany:", germany_pop, "\n",
     "France:", france_pop, "\n",
     "Switzerland:", switzerland_pop, "\n",
     "Netherlands:", netherlands_pop, "\n",
     "Austria:", austria_pop, "\n",
     "Belgium:", belgium_pop, "\n",
     "Portugal:", portugal_pop, "\n",
     "Luxembourg:", luxembourg_pop, "\n",
     "Poland:", poland_pop, "\n",
     "Ireland:", ireland_pop, "\n",
     "Estonia:", estonia_pop, "\n",
     "Denmark:", denmark_pop, "\n",
     "Norway:", norway_pop, "\n",
     "Sweden:", sweden_pop, "\n",
     "Iceland:", iceland_pop, "\n",
     "Finland:", finland_pop, "\n",
     "UK:", uk_pop, "\n",
     "Brazil:", brazil_pop, "\n",
     "Russia:", russia_pop, "\n",
     "India:", india_pop, "\n")

print("NOTE: those figures are approximative.")
Population of different Countries in million (source: Google):

 Italy: 60.48 
 Spain: 46.66 
 Germany: 82.79 
 France: 66.99 
 Switzerland: 8.57 
 Netherlands: 17.18 
 Austria: 8.822 
 Belgium: 11.4 
 Portugal: 10.29 
 Luxembourg: 0.602 
 Poland: 37.97 
 Ireland: 4.904 
 Estonia: 1.328 
 Denmark: 5.603 
 Norway: 5.368 
 Sweden: 10.12 
 Iceland: 0.364 
 Finland: 5.513 
 UK: 66.44 
 Brazil: 212.559 
 Russia: 145.9 
 India: 1380 

NOTE: those figures are approximative.
In [35]:
print("Density of population of different Countries in people per square kilometre\n"\
      "(source: Google):\n\n",
     "Italy:", italy_dens, "\n",
     "Spain:", spain_dens, "\n",
     "Germany:", germany_dens, "\n",
     "France:", france_dens, "\n",
     "Switzerland:", switzerland_dens, "\n",
     "Netherlands:", netherlands_dens, "\n",
     "Austria:", austria_dens, "\n",
     "Belgium:", belgium_dens, "\n",
     "Portugal:", portugal_dens, "\n",
     "Luxembourg:", luxembourg_dens, "\n",
     "Poland:", poland_dens, "\n",
     "Ireland:", ireland_dens, "\n",
     "Estonia:", poland_dens, "\n",
     "Denmark:", denmark_dens, "\n",
     "Norway:", norway_dens, "\n",
     "Sweden:", sweden_dens, "\n",
     "Iceland:", iceland_dens, "\n",
     "Finland:", finland_dens, "\n",
     "UK:", uk_dens, "\n",
     "Brazil:", brazil_dens, "\n",
     "Russia:", russia_dens, "\n",
     "India:", india_dens, "\n")

print("NOTE: those figures are approximative.")
Density of population of different Countries in people per square kilometre
(source: Google):

 Italy: 201.3 
 Spain: 91.4 
 Germany: 240 
 France: 122.34 
 Switzerland: 219 
 Netherlands: 488 
 Austria: 109 
 Belgium: 383 
 Portugal: 111 
 Luxembourg: 242 
 Poland: 124 
 Ireland: 72 
 Estonia: 124 
 Denmark: 134 
 Norway: 15 
 Sweden: 25 
 Iceland: 3 
 Finland: 15 
 UK: 274 
 Brazil: 25 
 Russia: 8.54 
 India: 464 

NOTE: those figures are approximative.
In [36]:
print("Median age of different Countries (source: Wikipedia):\n\n",
      "Finland:", finland_median_age, "\n",
      "Denmark:", denmark_median_age, "\n",
      "Norwayd:", norway_median_age, "\n",
      "Sweden:", sweden_median_age, "\n",
      "Iceland:", iceland_median_age, "\n",
      "Italy:", italy_median_age, "\n",
      "Spain:", spain_median_age, "\n",
      "France:", france_median_age, "\n",
      "Switzerland:", switzerland_median_age, "\n",
      "Netherlands:", netherlands_median_age, "\n",
      "Austria:", austria_median_age, "\n",
      "Belgium:", belgium_median_age, "\n",
      "Portugal:", portugal_median_age, "\n",
      "Luxembourg:", luxembourg_median_age, "\n",
      "Polandd:", poland_median_age, "\n",
      "Ireland:", ireland_median_age, "\n",
      "Estonia:", estonia_median_age, "\n",
      "Brazil:", brazil_median_age, "\n",
      "Russia:", russia_median_age, "\n",
      "India:", india_median_age, "\n")
      
print("NOTE: those figures are from year 2018.")
Median age of different Countries (source: Wikipedia):

 Finland: 42.5 
 Denmark: 42.2 
 Norwayd: 39.2 
 Sweden: 41.2 
 Iceland: 36.5 
 Italy: 45.5 
 Spain: 42.7 
 France: 41.4 
 Switzerland: 42.4 
 Netherlands: 42.6 
 Austria: 44.0 
 Belgium: 41.4 
 Portugal: 42.2 
 Luxembourg: 39.3 
 Polandd: 39.7 
 Ireland: 36.5 
 Estonia: 41.6 
 Brazil: 31.4 
 Russia: 38.6 
 India: 26.8 

NOTE: those figures are from year 2018.
In [37]:
pd.options.display.max_colwidth = 150

print("Containment actions by the Finnish Government:\n")
# Setting both text and column headers text aligned to the left
# and omitting the indexes
measures.style.set_properties(**{'text-align': 'left'}).\
set_table_styles([ dict(selector='th', props=[('text-align', 'left')] ) ]).hide_index()
Containment actions by the Finnish Government:

Out[37]:
Date Actions
12.3. First containment measures: gathering of more than 500 people banned
16.3. State of emergency declared: closing shools, universities, museums, theatres, libraries, sport facilities; gathering of more than 10 people banned
28.3. Additional containment measures: Uusimaa region borders closed, restaurant dining forbidden
11.4. Additional containment measures: No passengers in ships from Germany, Sweden, Estonia
15.4. First releasing measures: Uusima border re-opened
14.5. More releasing misures: schools opening, business travell allowed within Schengen
1.6. Further releasing: gathering up to 50 people allowed, reopening of bars and restaurants, reopening of museums and theatres
15.6. End of state of emergency

5.3. Data Cleansing

In [38]:
# Fixing the errors in the original data in the Active column
daily_report['Active'] = daily_report['Confirmed'] - daily_report['Deaths'] - daily_report['Recovered']
In [39]:
# Converting null values in strings with value "Not applicable"
world_conf_clean = world_confirmed.fillna("Not applicable")
world_recov_clean = world_recovered.fillna("Not applicable")
world_deceas_clean = world_deceased.fillna("Not applicable")
daily_rep_clean = daily_report.fillna("Not applicable")

5.4. Data Preparation

5.4.1. New datasets with no NaN, no GPS coordinates / list of days / list of Countries

In [40]:
# Dropping the GPS coordinates and storing the result in new datasets
world_conf_short = world_conf_clean.drop(['Lat', 'Long'], axis=1)
world_recov_short = world_recov_clean.drop(['Lat', 'Long'], axis=1)
world_deceas_short = world_deceas_clean.drop(['Lat', 'Long'], axis=1)
# Dropping the columns not related to the cases counters
daily_rep_short = daily_rep_clean.drop(['Lat',
                                        'Long_',
                                        'Last_Update',
                                        'FIPS',
                                        'Admin2',
                                        'Combined_Key'],\
                                       axis=1)

# Grouping by Province/State and storing the result in new datasets
world_conf_group = world_conf_short.groupby(['Country/Region']).sum()
world_recov_group = world_recov_short.groupby(['Country/Region']).sum()
world_deceas_group = world_deceas_short.groupby(['Country/Region']).sum()
daily_rep_group = daily_rep_short.groupby(['Country_Region']).sum()
In [41]:
# Creating a list of dates

# Extracting only the columns containing the virus cases data for each day
world_conf_data = world_confirmed.iloc[:,4:]
# Extracting the column values (dates) and putting them in a list
days_all = world_conf_data.columns.values.tolist()

# Initializing an empty list
days_tot = []
# Looping through the number of days
for i in list(range(len(days_all))):
    # Extracting day and month and taking just the string value
    new_element=re.findall("[0-9]+[/][0-9]+", days_all[i])[0]
    # Adding the result to the list
    days_tot.append(new_element)
    
print("List of days for the plots:\n")
days_tot
List of days for the plots:

Out[41]:
['1/22',
 '1/23',
 '1/24',
 '1/25',
 '1/26',
 '1/27',
 '1/28',
 '1/29',
 '1/30',
 '1/31',
 '2/1',
 '2/2',
 '2/3',
 '2/4',
 '2/5',
 '2/6',
 '2/7',
 '2/8',
 '2/9',
 '2/10',
 '2/11',
 '2/12',
 '2/13',
 '2/14',
 '2/15',
 '2/16',
 '2/17',
 '2/18',
 '2/19',
 '2/20',
 '2/21',
 '2/22',
 '2/23',
 '2/24',
 '2/25',
 '2/26',
 '2/27',
 '2/28',
 '2/29',
 '3/1',
 '3/2',
 '3/3',
 '3/4',
 '3/5',
 '3/6',
 '3/7',
 '3/8',
 '3/9',
 '3/10',
 '3/11',
 '3/12',
 '3/13',
 '3/14',
 '3/15',
 '3/16',
 '3/17',
 '3/18',
 '3/19',
 '3/20',
 '3/21',
 '3/22',
 '3/23',
 '3/24',
 '3/25',
 '3/26',
 '3/27',
 '3/28',
 '3/29',
 '3/30',
 '3/31',
 '4/1',
 '4/2',
 '4/3',
 '4/4',
 '4/5',
 '4/6',
 '4/7',
 '4/8',
 '4/9',
 '4/10',
 '4/11',
 '4/12',
 '4/13',
 '4/14',
 '4/15',
 '4/16',
 '4/17',
 '4/18',
 '4/19',
 '4/20',
 '4/21',
 '4/22',
 '4/23',
 '4/24',
 '4/25',
 '4/26',
 '4/27',
 '4/28',
 '4/29',
 '4/30',
 '5/1',
 '5/2',
 '5/3',
 '5/4',
 '5/5',
 '5/6',
 '5/7',
 '5/8',
 '5/9',
 '5/10',
 '5/11',
 '5/12',
 '5/13',
 '5/14',
 '5/15',
 '5/16',
 '5/17',
 '5/18',
 '5/19',
 '5/20',
 '5/21',
 '5/22',
 '5/23',
 '5/24',
 '5/25',
 '5/26',
 '5/27',
 '5/28',
 '5/29',
 '5/30',
 '5/31',
 '6/1',
 '6/2',
 '6/3',
 '6/4',
 '6/5',
 '6/6',
 '6/7',
 '6/8',
 '6/9',
 '6/10',
 '6/11',
 '6/12',
 '6/13',
 '6/14',
 '6/15',
 '6/16',
 '6/17',
 '6/18',
 '6/19',
 '6/20',
 '6/21',
 '6/22',
 '6/23',
 '6/24',
 '6/25',
 '6/26',
 '6/27',
 '6/28',
 '6/29',
 '6/30',
 '7/1',
 '7/2',
 '7/3',
 '7/4',
 '7/5',
 '7/6',
 '7/7',
 '7/8',
 '7/9',
 '7/10',
 '7/11',
 '7/12',
 '7/13',
 '7/14',
 '7/15',
 '7/16',
 '7/17',
 '7/18',
 '7/19',
 '7/20',
 '7/21']
In [42]:
# Listing the Countries
print("List of Countries:\n")
world_conf_group.index.to_list()
List of Countries:

Out[42]:
['Afghanistan',
 'Albania',
 'Algeria',
 'Andorra',
 'Angola',
 'Antigua and Barbuda',
 'Argentina',
 'Armenia',
 'Australia',
 'Austria',
 'Azerbaijan',
 'Bahamas',
 'Bahrain',
 'Bangladesh',
 'Barbados',
 'Belarus',
 'Belgium',
 'Belize',
 'Benin',
 'Bhutan',
 'Bolivia',
 'Bosnia and Herzegovina',
 'Botswana',
 'Brazil',
 'Brunei',
 'Bulgaria',
 'Burkina Faso',
 'Burma',
 'Burundi',
 'Cabo Verde',
 'Cambodia',
 'Cameroon',
 'Canada',
 'Central African Republic',
 'Chad',
 'Chile',
 'China',
 'Colombia',
 'Comoros',
 'Congo (Brazzaville)',
 'Congo (Kinshasa)',
 'Costa Rica',
 "Cote d'Ivoire",
 'Croatia',
 'Cuba',
 'Cyprus',
 'Czechia',
 'Denmark',
 'Diamond Princess',
 'Djibouti',
 'Dominica',
 'Dominican Republic',
 'Ecuador',
 'Egypt',
 'El Salvador',
 'Equatorial Guinea',
 'Eritrea',
 'Estonia',
 'Eswatini',
 'Ethiopia',
 'Fiji',
 'Finland',
 'France',
 'Gabon',
 'Gambia',
 'Georgia',
 'Germany',
 'Ghana',
 'Greece',
 'Grenada',
 'Guatemala',
 'Guinea',
 'Guinea-Bissau',
 'Guyana',
 'Haiti',
 'Holy See',
 'Honduras',
 'Hungary',
 'Iceland',
 'India',
 'Indonesia',
 'Iran',
 'Iraq',
 'Ireland',
 'Israel',
 'Italy',
 'Jamaica',
 'Japan',
 'Jordan',
 'Kazakhstan',
 'Kenya',
 'Korea, South',
 'Kosovo',
 'Kuwait',
 'Kyrgyzstan',
 'Laos',
 'Latvia',
 'Lebanon',
 'Lesotho',
 'Liberia',
 'Libya',
 'Liechtenstein',
 'Lithuania',
 'Luxembourg',
 'MS Zaandam',
 'Madagascar',
 'Malawi',
 'Malaysia',
 'Maldives',
 'Mali',
 'Malta',
 'Mauritania',
 'Mauritius',
 'Mexico',
 'Moldova',
 'Monaco',
 'Mongolia',
 'Montenegro',
 'Morocco',
 'Mozambique',
 'Namibia',
 'Nepal',
 'Netherlands',
 'New Zealand',
 'Nicaragua',
 'Niger',
 'Nigeria',
 'North Macedonia',
 'Norway',
 'Oman',
 'Pakistan',
 'Panama',
 'Papua New Guinea',
 'Paraguay',
 'Peru',
 'Philippines',
 'Poland',
 'Portugal',
 'Qatar',
 'Romania',
 'Russia',
 'Rwanda',
 'Saint Kitts and Nevis',
 'Saint Lucia',
 'Saint Vincent and the Grenadines',
 'San Marino',
 'Sao Tome and Principe',
 'Saudi Arabia',
 'Senegal',
 'Serbia',
 'Seychelles',
 'Sierra Leone',
 'Singapore',
 'Slovakia',
 'Slovenia',
 'Somalia',
 'South Africa',
 'South Sudan',
 'Spain',
 'Sri Lanka',
 'Sudan',
 'Suriname',
 'Sweden',
 'Switzerland',
 'Syria',
 'Taiwan*',
 'Tajikistan',
 'Tanzania',
 'Thailand',
 'Timor-Leste',
 'Togo',
 'Trinidad and Tobago',
 'Tunisia',
 'Turkey',
 'US',
 'Uganda',
 'Ukraine',
 'United Arab Emirates',
 'United Kingdom',
 'Uruguay',
 'Uzbekistan',
 'Venezuela',
 'Vietnam',
 'West Bank and Gaza',
 'Western Sahara',
 'Yemen',
 'Zambia',
 'Zimbabwe']

5.4.2. Population age data

In [43]:
# Creating a Pandas series containing median ages for different Countries
countries_median_age = pd.Series({'Finland': finland_median_age,
                                  'Denmark': denmark_median_age,
                                  'Norway': norway_median_age,
                                  'Sweden': sweden_median_age,
                                  'Iceland': iceland_median_age,
                                  'Italy': italy_median_age,
                                  'Spain': spain_median_age,
                                  'Germany': germany_median_age,
                                  'France': france_median_age,
                                  'Switzerland': switzerland_median_age,
                                  'Netherlands': netherlands_median_age,
                                  'Austria': austria_median_age,
                                  'Belgium': belgium_median_age,
                                  'Portugal': portugal_median_age,
                                  'Luxembourg': luxembourg_median_age,
                                  'Poland': poland_median_age,
                                  'Ireland': ireland_median_age,
                                  'Estonia': estonia_median_age,
                                  'UK': uk_median_age,
                                  'US': us_median_age,
                                  'Brazil': brazil_median_age,
                                  'Russia': russia_median_age,
                                  'India': india_median_age})
# Calculating the minimum value
median_age_min = countries_median_age.min()
# Calculating the maximum value
median_age_max = countries_median_age.max()
# Calculating the median age range
median_age_range = median_age_max - median_age_min
print("The range of the median age in the Countries that are analyzed here is: "\
      "{:.1f} years"\
     .format(median_age_range))

# Extracting data related to Scandinavia
scand_median_age = countries_median_age[['Finland', 'Denmark', 'Norway', 'Sweden', 'Iceland']]
# Calculating the median age range
scand_median_age_range = scand_median_age.max() - scand_median_age.min()
print("The range of the median age in the Scandinavian Countries that are analyzed here is: "\
      "{:.1f} years"\
     .format(scand_median_age_range))

# Extracting data related to EU
eu_median_age = countries_median_age[['Finland', 'Denmark', 'Sweden',
                                      'Italy', 'Spain', 'Germany', 'France',
                                      'Netherlands', 'Austria', 'Belgium', 'Portugal',
                                      'Luxembourg', 'Poland', 'Ireland', 'Estonia']]
# Calculating the median age range
eu_median_age_range = eu_median_age.max() - eu_median_age.min()
print("The range of the median age in the EU Countries that are analyzed here is: "\
      "{:.1f} years"\
     .format(eu_median_age_range))
The range of the median age in the Countries that are analyzed here is: 18.9 years
The range of the median age in the Scandinavian Countries that are analyzed here is: 6.0 years
The range of the median age in the EU Countries that are analyzed here is: 9.2 years

5.4.3. World Data

In [44]:
# Selecting only the columns with the daily data
world_conf = world_conf_short.iloc[:,2:]
world_recov = world_recov_short.iloc[:,2:]
world_deceas = world_deceas_short.iloc[:,2:]
In [45]:
# Calculating cumulative worldwide data for each day
world_conf_tot = world_conf.sum()
world_recov_tot = world_recov.sum()
world_deceas_tot = world_deceas.sum()
In [46]:
# Calculating the active cases for each day
world_act_tot = list(np.array(world_conf_tot) - \
                     np.array(world_recov_tot) - \
                     np.array(world_deceas_tot))
In [47]:
# Calculating the daily increments in the deceased cases
world_conf_incr = calc_increments(world_conf_tot)
# Calculating the daily increments in the confirmed cases
world_deceas_incr = calc_increments(world_deceas_tot)
In [48]:
# Finding the cumulative per capita data worldwide
world_conf_perc = pop_perc(world_conf_tot, 7.8*1000)
world_deceas_perc = pop_perc(world_deceas_tot, 7.8*1000)

5.4.4. Finnish data

In [49]:
# Calling the function extract_country to extract data related to Finland
# (skipping the first 6 days since they contain no confirmed cases)
Finland_6 = extract_country("Finland", "Not applicable", 6)
# Extracting the confirmed cases
finland_conf_6 = Finland_6[0]
# Extracting the recovered cases
finland_recov_6 = Finland_6[1]
# Extracting the decased cases
finland_deceas_6 = Finland_6[2]

# Creating a list of days to use for Finnish charts
# (skipping the first 6 days)
days_fin = days_tot[6:]
In [50]:
print("Compact Finnish data set:\n")
print("first day:", days_fin[0])
print("number of days:", len(days_fin))
Compact Finnish data set:

first day: 1/28
number of days: 176
In [51]:
# Visualizing the complete series
print("Confirmed cases time series:")
finland_conf_6
Confirmed cases time series:
Out[51]:
[0,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 2,
 2,
 2,
 3,
 6,
 6,
 6,
 6,
 12,
 15,
 15,
 23,
 30,
 40,
 59,
 59,
 155,
 225,
 244,
 277,
 321,
 336,
 400,
 450,
 523,
 626,
 700,
 792,
 880,
 958,
 1041,
 1167,
 1240,
 1352,
 1418,
 1446,
 1518,
 1615,
 1882,
 1927,
 2176,
 2308,
 2487,
 2605,
 2769,
 2905,
 2974,
 3064,
 3161,
 3237,
 3369,
 3489,
 3681,
 3783,
 3868,
 4014,
 4129,
 4284,
 4395,
 4475,
 4576,
 4695,
 4740,
 4906,
 4995,
 5051,
 5176,
 5254,
 5327,
 5412,
 5573,
 5673,
 5738,
 5880,
 5962,
 5984,
 6003,
 6054,
 6145,
 6228,
 6286,
 6347,
 6380,
 6399,
 6443,
 6493,
 6537,
 6568,
 6579,
 6599,
 6628,
 6692,
 6743,
 6776,
 6826,
 6859,
 6885,
 6887,
 6911,
 6911,
 6941,
 6964,
 6981,
 7001,
 7025,
 7040,
 7064,
 7073,
 7087,
 7104,
 7108,
 7112,
 7117,
 7119,
 7133,
 7142,
 7143,
 7144,
 7155,
 7167,
 7172,
 7191,
 7198,
 7198,
 7209,
 7214,
 7236,
 7241,
 7242,
 7248,
 7253,
 7257,
 7262,
 7265,
 7273,
 7279,
 7291,
 7294,
 7295,
 7301,
 7296,
 7293,
 7301,
 7318,
 7335,
 7340,
 7351]
In [52]:
# Visualizing the complete series
print("Recovered cases time series:")
finland_recov_6
Recovered cases time series:
Out[52]:
[0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 10,
 10,
 10,
 10,
 10,
 10,
 10,
 10,
 10,
 10,
 10,
 10,
 10,
 10,
 10,
 10,
 10,
 10,
 300,
 300,
 300,
 300,
 300,
 300,
 300,
 300,
 300,
 300,
 300,
 300,
 300,
 300,
 1700,
 1700,
 1700,
 1700,
 2000,
 2000,
 2000,
 2000,
 2500,
 2500,
 2500,
 2500,
 2800,
 2800,
 3000,
 3000,
 3000,
 3000,
 3500,
 3500,
 3500,
 3500,
 4000,
 4000,
 4000,
 4000,
 4300,
 4300,
 4300,
 5000,
 5000,
 5000,
 5000,
 5000,
 4800,
 4800,
 4800,
 4800,
 4800,
 5100,
 5100,
 5100,
 5500,
 5500,
 5500,
 5500,
 5500,
 5500,
 5500,
 5800,
 5800,
 5800,
 5800,
 5800,
 5800,
 5800,
 6200,
 6200,
 6200,
 6200,
 6200,
 6200,
 6200,
 6200,
 6200,
 6200,
 6200,
 6400,
 6400,
 6600,
 6600,
 6600,
 6600,
 6600,
 6600,
 6600,
 6700,
 6700,
 6700,
 6700,
 6700,
 6700,
 6700,
 6800,
 6800,
 6800,
 6800,
 6800,
 6800,
 6800,
 6880,
 6880,
 6880,
 6880,
 6880,
 6880,
 6880]
In [53]:
# Visualizing the complete series
print("Deceased cases time series:")
finland_deceas_6
Deceased cases time series:
Out[53]:
[0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 1,
 1,
 1,
 1,
 3,
 5,
 7,
 9,
 11,
 13,
 17,
 17,
 19,
 20,
 25,
 28,
 27,
 34,
 40,
 42,
 48,
 49,
 56,
 59,
 64,
 72,
 75,
 82,
 90,
 94,
 98,
 141,
 149,
 172,
 177,
 186,
 190,
 193,
 199,
 206,
 211,
 218,
 220,
 230,
 240,
 246,
 252,
 255,
 260,
 265,
 267,
 271,
 275,
 284,
 287,
 293,
 297,
 298,
 300,
 301,
 304,
 306,
 306,
 306,
 307,
 308,
 312,
 313,
 313,
 314,
 316,
 320,
 318,
 320,
 321,
 322,
 322,
 322,
 323,
 323,
 324,
 324,
 325,
 325,
 325,
 326,
 326,
 326,
 326,
 326,
 326,
 326,
 326,
 327,
 327,
 327,
 327,
 328,
 328,
 328,
 328,
 328,
 328,
 328,
 329,
 329,
 329,
 329,
 329,
 329,
 329,
 329,
 329,
 329,
 329,
 329,
 328,
 328,
 328,
 328,
 328,
 328,
 328]
In [54]:
# Calculating the active cases
finland_act_6 = list(np.array(finland_conf_6) - \
                     np.array(finland_recov_6) - \
                     np.array(finland_deceas_6))

finland_act_6
Out[54]:
[0,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 1,
 1,
 1,
 2,
 5,
 5,
 5,
 5,
 11,
 14,
 14,
 22,
 29,
 39,
 58,
 58,
 154,
 224,
 234,
 267,
 311,
 326,
 390,
 440,
 512,
 615,
 689,
 781,
 867,
 943,
 1024,
 1148,
 1219,
 1329,
 1391,
 1419,
 1199,
 1295,
 1557,
 1599,
 1849,
 1974,
 2147,
 2263,
 2421,
 2556,
 2618,
 2705,
 2797,
 2865,
 1594,
 1707,
 1891,
 1989,
 1770,
 1873,
 1980,
 2112,
 1718,
 1789,
 1886,
 2002,
 1741,
 1900,
 1784,
 1833,
 1956,
 2024,
 1587,
 1666,
 1821,
 1918,
 1478,
 1615,
 1695,
 1713,
 1428,
 1470,
 1558,
 935,
 989,
 1049,
 1080,
 1098,
 1339,
 1387,
 1431,
 1462,
 1472,
 1191,
 1216,
 1279,
 930,
 962,
 1010,
 1039,
 1067,
 1067,
 1090,
 789,
 819,
 842,
 858,
 878,
 901,
 916,
 539,
 548,
 562,
 578,
 582,
 586,
 591,
 593,
 607,
 616,
 617,
 417,
 428,
 240,
 245,
 263,
 270,
 270,
 281,
 286,
 208,
 213,
 213,
 219,
 224,
 228,
 233,
 136,
 144,
 150,
 162,
 165,
 166,
 172,
 88,
 85,
 93,
 110,
 127,
 132,
 143]
In [55]:
# Creating a list of same lenght as days_fin containing the increment of
# the confirmed cases compared to the previous day (first derivate)
# This tells how quickly the confirmed cases are growing
finland_conf_incr_6 = calc_increments(finland_conf_6)

# Visualizing the all series
print("Daily increment in confirmed cases time series:")
finland_conf_incr_6
Daily increment in confirmed cases time series:
Out[55]:
[0.0,
 1,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 1,
 0,
 0,
 1,
 3,
 0,
 0,
 0,
 6,
 3,
 0,
 8,
 7,
 10,
 19,
 0,
 96,
 70,
 19,
 33,
 44,
 15,
 64,
 50,
 73,
 103,
 74,
 92,
 88,
 78,
 83,
 126,
 73,
 112,
 66,
 28,
 72,
 97,
 267,
 45,
 249,
 132,
 179,
 118,
 164,
 136,
 69,
 90,
 97,
 76,
 132,
 120,
 192,
 102,
 85,
 146,
 115,
 155,
 111,
 80,
 101,
 119,
 45,
 166,
 89,
 56,
 125,
 78,
 73,
 85,
 161,
 100,
 65,
 142,
 82,
 22,
 19,
 51,
 91,
 83,
 58,
 61,
 33,
 19,
 44,
 50,
 44,
 31,
 11,
 20,
 29,
 64,
 51,
 33,
 50,
 33,
 26,
 2,
 24,
 0,
 30,
 23,
 17,
 20,
 24,
 15,
 24,
 9,
 14,
 17,
 4,
 4,
 5,
 2,
 14,
 9,
 1,
 1,
 11,
 12,
 5,
 19,
 7,
 0,
 11,
 5,
 22,
 5,
 1,
 6,
 5,
 4,
 5,
 3,
 8,
 6,
 12,
 3,
 1,
 6,
 -5,
 -3,
 8,
 17,
 17,
 5,
 11]
In [56]:
# Calculating the incremental values of the deceased cases
finland_deceas_incr_6 = calc_increments(finland_deceas_6)
In [57]:
# Extracting all data about Finland (from the first available day)
Finland_0 = extract_country("Finland", "Not applicable", 0)
# Extracting the confirmed cases
finland_conf_0 = Finland_0[0]
# Extracting the recovered cases
finland_recov_0 = Finland_0[1]
# Extracting the decased cases
finland_deceas_0 = Finland_0[2]

# Calculating the incremental values of the confirmed cases
finland_conf_incr_0 = calc_increments(finland_conf_0)
# Calculating the incremental values of the deceased cases
finland_deceas_incr_0 = calc_increments(finland_deceas_0)

# Extracting the dataseries from the first confirmed case in the Country
# by using the function extract_non_null
# (the function extracts all non null values, not only the leading zeros
# but this is OK since the total confirmed cases cannot decrease)

finland_conf_pos = extract_non_null(finland_conf_0)
In [58]:
# Using the function pop_perc to calculate the confirmed cumulative cases
# in percentage of the total population
finland_conf_0_perc = pop_perc(finland_conf_0, finland_pop)
# Doing the same for the deceased cases
finland_deceas_0_perc = pop_perc(finland_deceas_0, finland_pop)

5.4.5. Data from other Scandinavian Countries and Estonia

In [59]:
# Preparing data related to the other Scandinavian Countries and Estonia

# 1. Skipping the first 6 days of the time series

# Denmark
# Calling the function prep_country_data to extract data related to the Country
denmark_6 = prep_country_data("Denmark", denmark_pop, "Not applicable", 6)
# Cumulative confirmed cases
denmark_conf_6 = denmark_6[0]
# Cumulative recovered cases
denmark_recov_6 = denmark_6[1]
# Cumulative deceased cases
denmark_deceas_6 = denmark_6[2]
# Daily confirmed cases
denmark_conf_incr_6 = denmark_6[4]

# Norway
norway_6 = prep_country_data("Norway", norway_pop, "Not applicable", 6)
norway_conf_6 = norway_6[0]
norway_recov_6 = norway_6[1]
norway_deceas_6 = norway_6[2]
norway_conf_incr_6 = norway_6[4]

# Sweden
sweden_6 = prep_country_data("Sweden", sweden_pop, "Not applicable", 6)
sweden_conf_6 = sweden_6[0]
sweden_recov_6 = sweden_6[1]
sweden_deceas_6 = sweden_6[2]
sweden_conf_incr_6 = sweden_6[4]

# Iceland
iceland_6 = prep_country_data("Iceland", iceland_pop, "Not applicable", 6)
iceland_conf_6 = iceland_6[0]
iceland_recov_6 = iceland_6[1]
iceland_deceas_6 = iceland_6[2]
iceland_conf_incr_6 = iceland_6[4]

# Estonia
estonia_6 = prep_country_data("Estonia", estonia_pop, "Not applicable", 6)
estonia_conf_6 = estonia_6[0]
estonia_recov_6 = estonia_6[1]
estonia_deceas_6 = estonia_6[2]
estonia_conf_incr_6 = estonia_6[4]

# 2. complete time series

# Denmark
# Calling the function prep_country_data to extract data related to the Country
denmark_0 = prep_country_data("Denmark", denmark_pop, "Not applicable", 0)
# Cumulative confirmed cases
denmark_conf_0 = denmark_0[0]
# Cumulative recovered cases
denmark_recov_0 = denmark_0[1]
# Cumulative deceased cases
denmark_deceas_0 = denmark_0[2]
# Cumulative active cases
denmark_act_0 = denmark_0[3]
# Daily confirmed cases
denmark_conf_incr_0 = denmark_0[4]
# Daily deceased cases
denmark_deceas_incr_0 = denmark_0[5]
# Cumulative confirmed cases starting from the day of the first positive case
denmark_conf_pos = denmark_0[6]
# Cumulative confirmed cases per capita
denmark_conf_0_perc = denmark_0[7]
# Cumulative deceased cases per capita
denmark_deceas_0_perc = denmark_0[8]

# Norway
norway_0 = prep_country_data("Norway", norway_pop, "Not applicable", 0)
norway_conf_0 = norway_0[0]
norway_recov_0 = norway_0[1]
norway_deceas_0 = norway_0[2]
norway_act_0 = norway_0[3]
norway_conf_incr_0 = norway_0[4]
norway_deceas_incr_0 = norway_0[5]
norway_conf_pos = norway_0[6]
norway_conf_0_perc = norway_0[7]
norway_deceas_0_perc = norway_0[8]

# Sweden
sweden_0 = prep_country_data("Sweden", sweden_pop, "Not applicable", 0)
sweden_conf_0 = sweden_0[0]
sweden_recov_0 = sweden_0[1]
sweden_deceas_0 = sweden_0[2]
sweden_act_0 = sweden_0[3]
sweden_conf_incr_0 = sweden_0[4]
sweden_deceas_incr_0 = sweden_0[5]
sweden_conf_pos = sweden_0[6]
sweden_conf_0_perc = sweden_0[7]
sweden_deceas_0_perc = sweden_0[8]

# Iceland
iceland_0 = prep_country_data("Iceland", iceland_pop, "Not applicable", 0)
iceland_conf_0 = iceland_0[0]
iceland_recov_0 = iceland_0[1]
iceland_deceas_0 = iceland_0[2]
iceland_act_0 = iceland_0[3]
iceland_conf_incr_0 = iceland_0[4]
iceland_deceas_incr_0 = iceland_0[5]
iceland_conf_pos = iceland_0[6]
iceland_conf_0_perc = iceland_0[7]
iceland_deceas_0_perc = iceland_0[8]

# Estonia
estonia_0 = prep_country_data("Estonia", estonia_pop, "Not applicable", 0)
estonia_conf_0 = estonia_0[0]
estonia_recov_0 = estonia_0[1]
estonia_deceas_0 = estonia_0[2]
estonia_act_0 = estonia_0[3]
estonia_conf_incr_0 = estonia_0[4]
estonia_deceas_incr_0 = estonia_0[5]
estonia_conf_pos = estonia_0[6]
estonia_conf_0_perc = estonia_0[7]
estonia_deceas_0_perc = estonia_0[8]

5.4.6. Data from other European Countries

In [60]:
# Calling the function prep_country_data to extract data related to Italy
italy_0 = prep_country_data("Italy", italy_pop, "Not applicable", 0)
# Cumulative confirmed cases
italy_conf_0 = italy_0[0]
# Cumulative recovered cases
italy_recov_0 = italy_0[1]
# Cumulative deceased cases
italy_deceas_0 = italy_0[2]
# Cumulative active cases
italy_act_0 = italy_0[3]
# Daily confirmed cases
italy_conf_incr_0 = italy_0[4]
# Daily deceased cases
italy_deceas_incr_0 = italy_0[5]
# Cumulative confirmed cases starting from the day of the first positive case
italy_conf_pos = italy_0[6]
# Cumulative confirmed cases per capita
italy_conf_0_perc = italy_0[7]
# Cumulative deceased cases per capita
italy_deceas_0_perc = italy_0[8]
In [61]:
# Preparing data related to Spain
spain_0 = prep_country_data("Spain", spain_pop, "Not applicable", 0)
spain_conf_0 = spain_0[0]
spain_recov_0 = spain_0[1]
spain_deceas_0 = spain_0[2]
spain_act_0 = spain_0[3]
spain_conf_incr_0 = spain_0[4]
spain_deceas_incr_0 = spain_0[5]
spain_conf_pos = spain_0[6]
spain_conf_0_perc = spain_0[7]
spain_deceas_0_perc = spain_0[8]
In [62]:
# Preparing data related to Germany
germany_0 = prep_country_data("Germany", germany_pop, "Not applicable", 0)
germany_conf_0 = germany_0[0]
germany_recov_0 = germany_0[1]
germany_deceas_0 = germany_0[2]
germany_act_0 = germany_0[3]
germany_conf_incr_0 = germany_0[4]
germany_deceas_incr_0 = germany_0[5]
germany_conf_pos = germany_0[6]
germany_conf_0_perc = germany_0[7]
germany_deceas_0_perc = germany_0[8]
In [63]:
# Preparing data related to France
france_0 = prep_country_data("France", france_pop, "Not applicable", 0)
france_conf_0 = france_0[0]
france_recov_0 = france_0[1]
france_deceas_0 = france_0[2]
france_act_0 = france_0[3]
france_conf_incr_0 = france_0[4]
france_deceas_incr_0 = france_0[5]
france_conf_pos = france_0[6]
france_conf_0_perc = france_0[7]
france_deceas_0_perc = france_0[8]
In [64]:
# Preparing data related to Switzerland
switzerland_0 = prep_country_data("Switzerland", switzerland_pop, "Not applicable", 0)
switzerland_conf_0 = switzerland_0[0]
switzerland_recov_0 = switzerland_0[1]
switzerland_deceas_0 = switzerland_0[2]
switzerland_act_0 = switzerland_0[3]
switzerland_conf_incr_0 = switzerland_0[4]
switzerland_deceas_incr_0 = switzerland_0[5]
switzerland_conf_pos = switzerland_0[6]
switzerland_conf_0_perc = switzerland_0[7]
switzerland_deceas_0_perc = switzerland_0[8]
In [65]:
# Preparing data related to Netherlands
netherlands_0 = prep_country_data("Netherlands", netherlands_pop, "Not applicable", 0)
netherlands_conf_0 = netherlands_0[0]
netherlands_recov_0 = netherlands_0[1]
netherlands_deceas_0 = netherlands_0[2]
netherlands_act_0 = netherlands_0[3]
netherlands_conf_incr_0 = netherlands_0[4]
netherlands_deceas_incr_0 = netherlands_0[5]
netherlands_conf_pos = netherlands_0[6]
netherlands_conf_0_perc = netherlands_0[7]
netherlands_deceas_0_perc = netherlands_0[8]
In [66]:
# Preparing data related to Austria
austria_0 = prep_country_data("Austria", austria_pop, "Not applicable", 0)
austria_conf_0 = austria_0[0]
austria_recov_0 = austria_0[1]
austria_deceas_0 = austria_0[2]
austria_act_0 = austria_0[3]
austria_conf_incr_0 = austria_0[4]
austria_deceas_incr_0 = austria_0[5]
austria_conf_pos = austria_0[6]
austria_conf_0_perc = austria_0[7]
austria_deceas_0_perc = austria_0[8]
In [67]:
# Preparing data related to Belgium
belgium_0 = prep_country_data("Belgium", belgium_pop, "Not applicable", 0)
belgium_conf_0 = belgium_0[0]
belgium_recov_0 = belgium_0[1]
belgium_deceas_0 = belgium_0[2]
belgium_act_0 = belgium_0[3]
belgium_conf_incr_0 = belgium_0[4]
belgium_deceas_incr_0 = belgium_0[5]
belgium_conf_pos = belgium_0[6]
belgium_conf_0_perc = belgium_0[7]
belgium_deceas_0_perc = belgium_0[8]
In [68]:
# Preparing data related to Portugal
portugal_0 = prep_country_data("Portugal", portugal_pop, "Not applicable", 0)
portugal_conf_0 = portugal_0[0]
portugal_recov_0 = portugal_0[1]
portugal_deceas_0 = portugal_0[2]
portugal_act_0 = portugal_0[3]
portugal_conf_incr_0 = portugal_0[4]
portugal_deceas_incr_0 = portugal_0[5]
portugal_conf_pos = portugal_0[6]
portugal_conf_0_perc = portugal_0[7]
portugal_deceas_0_perc = portugal_0[8]
In [69]:
# Preparing data related to Luxembourg
luxembourg_0 = prep_country_data("Luxembourg", luxembourg_pop, "Not applicable", 0)
luxembourg_conf_0 = luxembourg_0[0]
luxembourg_recov_0 = luxembourg_0[1]
luxembourg_deceas_0 = luxembourg_0[2]
luxembourg_act_0 = luxembourg_0[3]
luxembourg_conf_incr_0 = luxembourg_0[4]
luxembourg_deceas_incr_0 = luxembourg_0[5]
luxembourg_conf_pos = luxembourg_0[6]
luxembourg_conf_0_perc = luxembourg_0[7]
luxembourg_deceas_0_perc = luxembourg_0[8]
In [70]:
# Preparing data related to Poland
poland_0 = prep_country_data("Poland", poland_pop, "Not applicable", 0)
poland_conf_0 = poland_0[0]
poland_recov_0 = poland_0[1]
poland_deceas_0 = poland_0[2]
poland_act_0 = poland_0[3]
poland_conf_incr_0 = poland_0[4]
poland_deceas_incr_0 = poland_0[5]
poland_conf_pos = poland_0[6]
poland_conf_0_perc = poland_0[7]
poland_deceas_0_perc = poland_0[8]
In [71]:
# Preparing data related to Ireland
ireland_0 = prep_country_data("Ireland", ireland_pop, "Not applicable", 0)
ireland_conf_0 = ireland_0[0]
ireland_recov_0 = ireland_0[1]
ireland_deceas_0 = ireland_0[2]
ireland_act_0 = ireland_0[3]
ireland_conf_incr_0 = ireland_0[4]
ireland_deceas_incr_0 = ireland_0[5]
ireland_conf_pos = ireland_0[6]
ireland_conf_0_perc = ireland_0[7]
ireland_deceas_0_perc = ireland_0[8]

5.4.7. Data from UK and US

In [72]:
# Preparing data related to UK
uk_0 = prep_country_data("United Kingdom", uk_pop, "Not applicable", 0)
uk_conf_0 = uk_0[0]
uk_recov_0 = uk_0[1]
uk_deceas_0 = uk_0[2]
uk_act_0 = uk_0[3]
uk_conf_incr_0 = uk_0[4]
uk_deceas_incr_0 = uk_0[5]
uk_conf_pos = uk_0[6]
uk_conf_0_perc = uk_0[7]
uk_deceas_0_perc = uk_0[8]
In [73]:
# Preparing data related to US
us_0 = prep_country_data("US", us_pop, "Not applicable", 0)
us_conf_0 = us_0[0]
us_recov_0 = us_0[1]
us_deceas_0 = us_0[2]
us_act_0 = us_0[3]
us_conf_incr_0 = us_0[4]
us_deceas_incr_0 = us_0[5]
us_conf_pos = us_0[6]
us_conf_0_perc = us_0[7]
us_deceas_0_perc = us_0[8]

5.4.8. Data from Brazil, Russia and India

In [74]:
# Preparing data related to Brazil
brazil_0 = prep_country_data("Brazil", brazil_pop, "Not applicable", 0)
brazil_conf_0 = brazil_0[0]
brazil_recov_0 = brazil_0[1]
brazil_deceas_0 = brazil_0[2]
brazil_act_0 = brazil_0[3]
brazil_conf_incr_0 = brazil_0[4]
brazil_deceas_incr_0 = brazil_0[5]
brazil_conf_pos = brazil_0[6]
brazil_conf_0_perc = brazil_0[7]
brazil_deceas_0_perc = brazil_0[8]
In [75]:
# Preparing data related to Russia
russia_0 = prep_country_data("Russia", russia_pop, "Not applicable", 0)
russia_conf_0 = russia_0[0]
russia_recov_0 = russia_0[1]
russia_deceas_0 = russia_0[2]
russia_act_0 = russia_0[3]
russia_conf_incr_0 = russia_0[4]
russia_deceas_incr_0 = russia_0[5]
russia_conf_pos = russia_0[6]
russia_conf_0_perc = russia_0[7]
russia_deceas_0_perc = russia_0[8]
In [76]:
# Preparing data related to India
india_0 = prep_country_data("India", india_pop, "Not applicable", 0)
india_conf_0 = india_0[0]
india_recov_0 = india_0[1]
india_deceas_0 = india_0[2]
india_act_0 = india_0[3]
india_conf_incr_0 = india_0[4]
india_deceas_incr_0 = india_0[5]
india_conf_pos = india_0[6]
india_conf_0_perc = india_0[7]
india_deceas_0_perc = india_0[8]

5.4.9. Data from China

In [77]:
# Daily Report from China broken by Provinces
daily_rep_short[daily_rep_short['Country_Region'] == "China"]
Out[77]:
Province_State Country_Region Confirmed Deaths Recovered Active Incidence_Rate Case-Fatality_Ratio
3250 Anhui China 991 6 985 0 1.56705 0.605449
3281 Beijing China 929 9 752 168 4.31291 0.968784
3329 Chongqing China 583 6 576 1 1.87943 1.02916
3364 Fujian China 363 1 361 1 0.921086 0.275482
3369 Gansu China 167 2 165 0 0.633295 1.1976
3385 Guangdong China 1650 8 1636 6 1.45426 0.484848
3386 Guangxi China 254 2 252 0 0.515631 0.787402
3389 Guizhou China 147 2 145 0 0.408333 1.36054
3392 Hainan China 171 6 165 0 1.83084 3.50877
3396 Hebei China 349 6 340 3 0.461885 1.7192
3397 Heilongjiang China 947 13 934 0 2.50994 1.37276
3398 Henan China 1276 22 1254 0 1.32847 1.72414
3404 Hong Kong China 1655 10 1254 391 22.0755 0.60423
3407 Hubei China 68135 4512 63623 0 115.151 6.62215
3409 Hunan China 1019 4 1015 0 1.47703 0.392542
3414 Inner Mongolia China 249 1 237 11 0.982636 0.401606
3427 Jiangsu China 654 0 654 0 0.812321 0
3428 Jiangxi China 932 1 931 0 2.00516 0.107296
3429 Jilin China 155 2 153 0 0.573225 1.29032
3474 Liaoning China 164 2 150 12 0.376233 1.21951
3485 Macau China 46 0 45 1 7.08409 0
3539 Ningxia China 75 0 75 0 1.09012 0
3596 Qinghai China 18 0 18 0 0.298507 0
3639 Shaanxi China 321 3 315 3 0.830745 0.934579
3640 Shandong China 793 7 785 1 0.78929 0.882724
3641 Shanghai China 732 7 691 34 3.0198 0.956284
3642 Shanxi China 201 0 198 3 0.540613 0
3646 Sichuan China 599 3 590 6 0.718139 0.500835
3674 Tianjin China 203 3 195 5 1.30128 1.47783
3675 Tibet China 1 0 1 0 0.0290698 0
3732 Xinjiang China 77 3 73 1 0.30961 3.8961
3740 Yunnan China 188 2 183 3 0.389234 1.06383
3746 Zhejiang China 1270 1 1267 2 2.2137 0.0787402
In [78]:
print("Number of entries related to China:")
len(daily_rep_short[daily_rep_short['Country_Region'] == "China"])
Number of entries related to China:
Out[78]:
33
In [79]:
# Extracting data related to Hubei province by screning out the text variables
# and putting the result in list format
hubei_conf_0 = world_conf_short[(world_conf_short['Country/Region'] == 'China') & \
                                (world_conf_short['Province/State'] == 'Hubei')]
hubei_conf_0 = hubei_conf_0.iloc[:, 2:].values.tolist()[0]
hubei_conf_incr_0 = calc_increments(hubei_conf_0)
hubei_conf_0_perc = pop_perc(hubei_conf_0, hubei_pop)

hubei_recov_0 = world_recov_short[(world_recov_short['Country/Region'] == 'China') & \
                                  (world_recov_short['Province/State'] == 'Hubei')]
hubei_recov_0 = hubei_recov_0.iloc[:, 2:].values.tolist()[0]

hubei_deceas_0 = world_deceas_short[(world_deceas_short['Country/Region'] == 'China') & \
                                    (world_deceas_short['Province/State'] == 'Hubei')]
hubei_deceas_0 = hubei_deceas_0.iloc[:, 2:].values.tolist()[0]

hubei_act_0 = list(np.array(hubei_conf_0) - \
                   np.array(hubei_recov_0) - \
                   np.array(hubei_deceas_0))

# Extracting data related to all the other provinces, making the sum
# and putting the result in list format
restchina_conf_0 = world_conf_short[(world_conf_short['Country/Region'] == 'China') & \
                                    (world_conf_short['Province/State'] !=  'Hubei')]
restchina_conf_0 = restchina_conf_0.groupby(['Country/Region']).sum()
restchina_conf_0 = restchina_conf_0.values.tolist()[0]
restchina_conf_incr_0 = calc_increments(restchina_conf_0)
restchina_conf_0_perc = pop_perc(restchina_conf_0, restchina_pop)

restchina_recov_0 = world_recov_short[(world_recov_short['Country/Region'] == 'China') & \
                                      (world_recov_short['Province/State'] !=  'Hubei')]
restchina_recov_0 = restchina_recov_0.groupby(['Country/Region']).sum()
restchina_recov_0 = restchina_recov_0.values.tolist()[0]

restchina_deceas_0 = world_deceas_short[(world_deceas_short['Country/Region'] == 'China') & \
                                        (world_deceas_short['Province/State'] !=  'Hubei')]
restchina_deceas_0 = restchina_deceas_0.groupby(['Country/Region']).sum()
restchina_deceas_0 = restchina_deceas_0.values.tolist()[0]

restchina_act_0 = list(np.array(restchina_conf_0) - \
                       np.array(restchina_recov_0) - \
                       np.array(restchina_deceas_0))

5.5. Summary of the Created Datasets

Within this document, different datasets are used for different purposes. This section provides a summary as a useful reference and describes the naming rules that have been used. Those variables that have been created temporarily just for reason of code clarity are not included in this list.

world_conf_clean

  • Dataframe based on world_confirmed (ime_series_covid19_confirmed_global.csv)
  • The NaN cells in the Province/State columns have been changed into strings with value "Not applicable"

world_recov_clean

  • Dataframe based on world_recovered (ime_series_covid19_recovered_global.csv)
  • The NaN cells in the Province/State columns have been changed into strings with value "Not applicable"

world_deceas_clean

  • Dataframe based on world_deceased (ime_series_covid19_deaths_global.csv)
  • The NaN cells in the Province/State columns have been changed into strings with value "Not applicable"

daily_rep_clean

  • Dataframe based on daily_report (mm-dd-yyyy.csv)
  • The NaN cells in the Province/State columns have been changed into strings with value "Not applicable"



world_conf_short, world_recov_short, world_deceas_short

  • Dataframe based on world_conf_clean, world_recov_clean, world_deceas_clean
  • GPS coordinates have been dropped

world_conf, world_recov, world_deceas

  • Dataframe based on world_conf_short, world_recov_short, world_deceas_short
  • Only columns with daily data have been selected

world_conf_tot, world_recov_tot, world_deceas_tot

  • Dataframe based on world_conf, world_recov, world_deceas
  • The overall worldwide daily sum has been calculated

world_act_tot

  • List based on world_conf_x, world_recov_x, world_deceas_x (the second and third are subtracted from the first) containing the active cases

world_conf_incr

  • Dataframe based on world_conf_tot containing the daily increments

world_deceas_incr

  • Dataframe based on world_deceas_tot containing the daily increments

daily_rep_short

  • Dataframe based on daily_rep_clean
  • All columns not containing cases counts have been dropped



world_conf_group, world_recov_group, world_deceas_group

  • Dataframe based on world_conf_short, world_recov_short, world_deceas_short
  • Data grouped by Country/Region

daily_rep_group

  • Dataframe based on daily_rep_short
  • Data grouped by Country/Region



days_tot

  • List obtained by using world_confirmed which contains the dates of all the days in m/d format

days_fin

  • List based on days_tot where the first 6 days have been dropped



country_conf_x, country_recov_x, country_deceas_x

  • where country is the Country written with small letters
  • where x is the number of days to skip in the time series starting from the first one
  • Lists obtained by using world_confirmed, world_recovered, world_deceased
  • Data related to Country has been extracted
  • Data related to the first x days has been dropped

country_act_x

  • List based on country_conf_x, country_recov_x, country_deceas_x (the second and third are subtracted from the first) containing the active cases

country_conf_incr_x

  • List based on country_conf_x containing the daily increments

country_deceas_incr_x

  • List based on country_deceas_x containing the daily increments

country_conf_0_perc

  • List based on country_conf_0
  • It containing the confirmed cumulative cases in percentage of the total population

country_deceas_0_perc

  • List based on country_deceas_0
  • It containing the deceased cumulative cases in percentage of the total population

country_conf_pos

  • where country is the Country written with small letters
  • Lists based on country_conf_0
  • Data related to days with zero cumulative cases in the Country has been dropped

6. Domain-Specific Concepts

The basic reproductive number, R0 is the average number of secondary infections generated by one infectious individual. When R0 > 1 the infection is able to spread. The aim of the non-pharmaceutical interventions (NPIs), as social distancing, is to reduce the value of R0.

https://www.imperial.ac.uk/media/imperial-college/medicine/sph/ide/gida-fellowships/Imperial-College-COVID19-transmissibility-25-01-2020.pdf

The Case Fatality Ratio (CFR) is the proportion of detected cases of a given disease that die as a result of it.

Surveillance is typically biased towards detecting clinically severe cases, particularly at the start of an epidemic when diagnostic capacity is limited. This leads to an over estimation of the CFR.

On the other hand, there is a time interval (2/3 weeks) between the onset of symptoms and death or recovery. Therefore, measuring the simple ratio deceased/infected during a growing epidemic does not allow to observe the outcome of all the infected cases, leading to a under estimation of the CFR.

https://www.imperial.ac.uk/media/imperial-college/medicine/sph/ide/gida-fellowships/Imperial-College-COVID19-severity-10-02-2020.pdf

NOTE: The Infection Fatality Rate is the percentage of people that get the infection and then die. This number is much harder to estimate compared to the CFR since we do not know the total amount of people that have been really infected in a certain area.

7. Data Visualization

7.1. Overview

7.1.1. General Comments to the Plots

The following curves are shown in the plots contained in this section:

  • Cumulative confirmed cases
  • Cumulative recovered cases
  • Cumulative deceased cases
  • Cumulative active cases
  • Daily increments in the confirmed cases
  • Daily increments in the recovered cases
  • Daily increments in the deceased cases
  • Daily increments in the active cases

The first four curves show the cumulative cases in a certain region since the start of the epidemic.

The cumulative confirmed cases curve is expected to grow exponentially and then slowly smoothing out towards a horizontal shape. Government decisions and people behavior can affect the way this curve looks like. The aim is to keep the curve not too steep in order not to saturate the capacity of the hospitals in the Country. However, it should be noted that the effects of Government and people actions are not immediate due to the incubation period.

The cumulative recovered cases curve follows the cumulative confirmed cases with a certain delay in time and a lower y value due to the amount of deceased cases.

The cumulative active cases are given by the confirmed cases minus the recovered cases minus the deceased cases. It is the only one of the cumulative cases curves that can decrease over time and this happens when the number of confirmed cases grows slower than the combined number of recovered and deceased cases. This curve is expected to have an (upside down) bell shape.

The new confirmed daily cases show the speed at which the virus is spreading. This curve is expected to have an (upside down) bell shape. This curve shows the daily values and therefore is shows also some noise. Some of this noise might be due to mistakes in reporting the daily data (sometimes data of a certain day is reported together with the next day data). This kind of mistake does not affect the grand total and affects only very little the trend of the curves.

The new recovered daily cases curve looks similar to the new confirmed daily cases curve with a delay in time and lower y values.

The incremental daily active cases curve shows two picks of opposite sign. The x value where the negative curve starts corresponds to the pick of the corresponding cumulative curve.

NOTE: The number of the actual confirmed cases is very likely above the number of the counted confirmed cases since not all population is tested and there might be many infected persons showing no symptoms. However, by assuming a constant testing policy during the all observation period, the rate of changes is unaffected by systematic under-reporting and therefore there is a lot of useful information that can be obtained by those curves.


"The only real data we have is from the flights used by a number of Countries to repatriate their citizens. The all population was tested on those planes. If the population samples given by the passengers of those flights would be representative of the all population, we could conclude that the epidemic is at least 3 times larger compared to what the collected data shows."

Feb 12th, Prof. Neil Ferguson, https://www.imperial.ac.uk/people/neil.ferguson


"By comparing the number of flights that came into a certain Country from the worst affected area in China (Wuhan City) with the cases detected in that Country, it can be bound that the number of cases per flight varies quite a lot depending on the Country.

Singapore had a relatively high number of cases compared to other Countries. By using that data as a benchmark, that is, by assuming the Singapore has detected all the cases, the result is that worldwide approximately 2/3 of the cases have not been detected."

Professor Christl Donnelly, https://www.imperial.ac.uk/people/c.donnelly


More recent serological tests show that the number of actual cases might be up to 10/20 times the number of counted confirmed cases.

7.1.2. A Reference Curve Set

The first complete curves are related to China. Let's analyze the curves related to China either than Hubei province. The curves can be divided in 4 phases which are named here after the shape of the cumulative confirmed cases curve.

1) Exponential increase phase

  • In the first phase the number of cumulative confirmed cases grows exponentially (it grows and it grows faster each day) while the number of recovered and deceased cases is still null (the number of cumulative confirmed cases corresponds to the number of active cases, which corresponds to the number of the "Infectious" in the popular epidemiologic SIR model (')). The increments in the number of confirmed cases shows the left side of a bell shape. The same happens for the incremental active cases.

2) Linear increase phase

  • In the second phase the number of confirmed cases grows at a quite constant speed (the cumulative cases grow in a straight line and the increment curve starts to flatten). In the middle of this phase we see the pick in the number of incremental confirmed cases (R0 has decreased to 1). There is also a pick on the incremental active cases. In this phase we see a quite modest increase in recovered and deceased cases and we start to see that the cumulative active cases curve and the cumulative confirmed cases curve take their own path.

3) Slowed-down increase phase

  • In the third phase the number of confirmed cases grows at a slower and slower speed (the cumulative curve starts to flatten towards a horizontal shape and the incremental confirmed cases curve shows the right side of the bell). R0 starts to decrease below 1. In this phase the number of cumulative recovered and deceased cases keeps growing and the number of cumulative active cases reaches a pick and then starts to decrease. The pick on the cumulative active cases is known as Herd Immunity. In the incremental active cases curve this is seen as the point when the curve changes sign. There is a lag in time between the pick in new confirmed cases seen in phase 2 and the pick in active cases seen in this phase.

4) No increase phase

  • In the fourth phase the number of cumulative confirmed cases remains constant and consequently the corresponding incremental curve is zero (R0 is almost 0). The number of recovered and deceased cases keeps growing and the active cases decrease down towards zero.

Note that a new wave might follow (as it might happen in China outside Hubei).

Note that should the testing policy change during the observation period, the curve might look different.

Whenever containment measures have been adopted in a certain area, the earliest moment in time when it makes sense to start to release them gradually is after the Herd Immunity pick. However, in this case the Herd Immunity has been obtained under certain conditions (the containment measures) and therefore, as soon as those conditions are released, the Heard Immunity is no longer valid. Release of containment measures might cause the curves to differ from this example and might lead to new picks before the active cases curves goes to zero.

(') https://medium.com/data-for-science/epidemic-modeling-101-or-why-your-covid19-exponential-fits-are-wrong-97aa50c55f8

In [80]:
# Plotting daily cumulative cases in the rest of China
cust_line_plot((days_tot, restchina_conf_0, ".", '-', 0, "confirmed cases"),
               (days_tot, restchina_recov_0, ".", '-', 2, "recovered cases"),
               (days_tot, restchina_deceas_0, ".", '-', 3, "deceased cases"),
               (days_tot, restchina_act_0, ".", '-', 1, "active cases"),
               figsize_w=18, figsize_h=12,
               title="Coronavirus COVID-19 cumulative cases in China "\
                     "either than Hubei over time",
               title_fs=18, title_offset=20,
               rem_borders=True,
               label_fs=12, tick_fs=10, 
               x_label="month/day",
               rot=90,
               y_label=None,
               legend=True,
               leg_fs=12,
               first_line_x='1/30', first_line_col=7, first_line_ls=':',
               first_line_x_l='End of the exponential increase phase',
               second_line_x='2/5', second_line_col=7, second_line_ls='--',
               second_line_x_l='End of the linear increase phase',
               third_line_x='2/22', third_line_col=7, third_line_ls='-.',
               third_line_x_l='End of the slowed-down increase phase',
               fourth_line_x='3/13', fourth_line_col=7, fourth_line_ls='-',
               fourth_line_x_l='End of the no increase phase',
               fifth_line_x='2/11', fifth_line_col=6, fifth_line_ls='--',
               fifth_line_x_l='Herd Immunity')
In [81]:
# Plotting daily increments in confirmed cases in the rest of China
cust_bar_plot((days_tot, restchina_conf_incr_0, 0, "New daily confirmed cases"),
               figsize_w=18, figsize_h=12,
               title="Coronavirus COVID-19 new daily confirmed cases in China "\
                     "either than Hubei",
               title_fs=18, title_offset=20,
               rem_borders=True,
               label_fs=12, tick_fs=10, 
               x_label="month/day",
               rot=90,
               y_label=None,
               legend=True,
               leg_fs=12,
               first_line_x='1/30', first_line_col=7, first_line_ls=':',
               first_line_x_l='End of the exponential increase phase',
               second_line_x='2/5', second_line_col=7, second_line_ls='--',
               second_line_x_l='End of the linear increase phase',
               third_line_x='2/22', third_line_col=7, third_line_ls='-.',
               third_line_x_l='End of the slowed-down increase phase',
               fourth_line_x='3/13', fourth_line_col=7, fourth_line_ls='-',
               fourth_line_x_l='End of the no increase phase',
               fifth_line_x='2/11', fifth_line_col=6, fifth_line_ls='--',
               fifth_line_x_l='Herd Immunity')
In [82]:
# Plotting new daily deceased cases in the rest of China
cust_bar_plot((days_tot, calc_increments(restchina_deceas_0), 3, 
               "Daily deceased cases by COVID-19"),
              figsize_w=18, figsize_h=12,
              title="Coronavirus COVID-19 new daily (reported) deceased cases "\
                    "in China either than Hubei",
              title_fs=18, title_offset=20,
              rem_borders=True,
              label_fs=12, tick_fs=10, 
              x_label="month/day",
              rot=90,
              y_label=None,
              legend=True,
              leg_fs=12,
              legend_loc=0,
              first_line_x='1/30', first_line_col=7, first_line_ls=':',
              first_line_x_l='End of the exponential increase phase',
              second_line_x='2/5', second_line_col=7, second_line_ls='--',
              second_line_x_l='End of the linear increase phase',
              third_line_x='2/22', third_line_col=7, third_line_ls='-.',
              third_line_x_l='End of the slowed-down increase phase',
              fourth_line_x='3/13', fourth_line_col=7, fourth_line_ls='-',
              fourth_line_x_l='End of the no increase phase',
              fifth_line_x='2/11', fifth_line_col=6, fifth_line_ls='--',
              fifth_line_x_l='Herd Immunity')
In [83]:
# Plotting new daily recovered cases in the rest of China
cust_bar_plot((days_tot, calc_increments(restchina_recov_0), 2, 
               "Daily recovered cases by COVID-19"),
              figsize_w=18, figsize_h=12,
              title="Coronavirus COVID-19 new daily (reported) recovered cases "\
                    "in China either than Hubei",
              title_fs=18, title_offset=20,
              rem_borders=True,
              label_fs=12, tick_fs=10, 
              x_label="month/day",
              rot=90,
              y_label=None,
              legend=True,
              leg_fs=12,
              legend_loc=0,
              first_line_x='1/30', first_line_col=7, first_line_ls=':',
              first_line_x_l='End of the exponential increase phase',
              second_line_x='2/5', second_line_col=7, second_line_ls='--',
              second_line_x_l='End of the linear increase phase',
              third_line_x='2/22', third_line_col=7, third_line_ls='-.',
              third_line_x_l='End of the slowed-down increase phase',
              fourth_line_x='3/13', fourth_line_col=7, fourth_line_ls='-',
              fourth_line_x_l='End of the no increase phase',
              fifth_line_x='2/11', fifth_line_col=6, fifth_line_ls='--',
              fifth_line_x_l='Herd Immunity')
In [84]:
# Plotting daily increments in the active cases in the rest of China
cust_bar_plot((days_tot, calc_increments(restchina_act_0), 1, 
               "Daily increments in the active cases by COVID-19"),
              figsize_w=18, figsize_h=12,
              title="Coronavirus COVID-19 increments in the daily active cases "\
                    "in China either than Hubei",
              title_fs=18, title_offset=20,
              rem_borders=True,
              label_fs=12, tick_fs=10, 
              x_label="month/day",
              rot=90,
              y_label=None,
              legend=True,
              leg_fs=12,
              legend_loc=0,
              first_line_x='1/30', first_line_col=7, first_line_ls=':',
              first_line_x_l='End of the exponential increase phase',
              second_line_x='2/5', second_line_col=7, second_line_ls='--',
              second_line_x_l='End of the linear increase phase',
              third_line_x='2/22', third_line_col=7, third_line_ls='-.',
              third_line_x_l='End of the slowed-down increase phase',
              fourth_line_x='3/13', fourth_line_col=7, fourth_line_ls='-',
              fourth_line_x_l='End of the no increase phase',
              fifth_line_x='2/11', fifth_line_col=6, fifth_line_ls='--',
              fifth_line_x_l='Herd Immunity')

7.2. Finnish Internal Situation

Unfortunately, the Finnish Institute for Health and Welfare (THL) does not publish reliable daily data about the recovered cases and therefore it is not possible to draw an accurate curve for the active cases.

Notes:

The increased speed in the confirmed cases on 4/4 is due to change in testing policy.

The confirmed cases data from 3/12 has been reported on 3/13.

Obviously, there is something wrong in the source data since it shows that the cumulative deaths on 1/6 are smaller than the cumulative deaths of 31/5, hence the negative value for the increment in deceased cases on 1/6. The same applies to 4/6.

In [85]:
print("Error data in deceased cases in Finland:")
find_error_days(finland_deceas_0)
Error data in deceased cases in Finland:
['4/6', '6/1', '7/15']
In [86]:
# Plotting daily cumulative cases in Finland
cust_line_plot((days_fin, finland_conf_6, ".", '-', 0, "confirmed cases"),
               #(days_fin, finland_recov_6, ".", '-', 2, "recovered cases"),
               (days_fin, finland_deceas_6, ".", '-', 3, "deceased cases"),
               #(days_fin, finland_act_6, ".", '-', 1, "active cases"),
               figsize_w=18, figsize_h=12,
               title="Coronavirus COVID-19 cumulative cases in Finland over time",
               title_fs=18, title_offset=20,
               rem_borders=True,
               label_fs=12, tick_fs=10, 
               x_label="month/day",
               rot=90,
               y_label=None,
               legend=True,
               leg_fs=12,
               legend_loc=0,
               first_line_x='3/12', first_line_col=6,
               first_line_ls=':', first_line_x_l='First actions',
               second_line_x='3/16', second_line_col=6,
               second_line_ls='--', second_line_x_l='State of emergency declared',
               third_line_x='3/28', third_line_col=6,
               third_line_ls='-.', third_line_x_l='Additional actions',
               fourth_line_x='4/11', fourth_line_col=6,
               fourth_line_ls='-', fourth_line_x_l='Tighter border control',
               fifth_line_x='4/15', fifth_line_col=8,
               fifth_line_ls='-.', fifth_line_x_l='Uusima border opened',
               sixth_line_x='5/14', sixth_line_col=8,
               sixth_line_ls='--', sixth_line_x_l='More releasing misures',
               seventh_line_x='6/1', seventh_line_col=8,
               seventh_line_ls=':', seventh_line_x_l='Further releasing',
               eighth_line_x='6/15', eighth_line_col=8,
               eighth_line_ls='-', eighth_line_x_l='End of state of emergency')
In [87]:
print("Concrete actions by the Finnish government")
measures.style.set_properties(**{'text-align': 'left'}).\
set_table_styles([ dict(selector='th', props=[('text-align', 'left')] ) ]).hide_index()
Concrete actions by the Finnish government
Out[87]:
Date Actions
12.3. First containment measures: gathering of more than 500 people banned
16.3. State of emergency declared: closing shools, universities, museums, theatres, libraries, sport facilities; gathering of more than 10 people banned
28.3. Additional containment measures: Uusimaa region borders closed, restaurant dining forbidden
11.4. Additional containment measures: No passengers in ships from Germany, Sweden, Estonia
15.4. First releasing measures: Uusima border re-opened
14.5. More releasing misures: schools opening, business travell allowed within Schengen
1.6. Further releasing: gathering up to 50 people allowed, reopening of bars and restaurants, reopening of museums and theatres
15.6. End of state of emergency
In [88]:
# Plotting new daily confirmed Coronavirus cases in Finland
cust_bar_plot((days_fin, finland_conf_incr_6, 0, "New daily confirmed cases"),
               figsize_w=18, figsize_h=12,
               title="Coronavirus COVID-19 new daily confirmed cases in Finland",
               title_fs=18, title_offset=20,
               rem_borders=True,
               label_fs=12, tick_fs=10, 
               x_label="month/day",
               rot=90,
               y_label=None,
               legend=True,
               leg_fs=12,
               legend_loc=0,
               first_line_x='3/12', first_line_col=6,
               first_line_ls=':', first_line_x_l='First actions',
               second_line_x='3/16', second_line_col=6,
               second_line_ls='--', second_line_x_l='State of emergency declared',
               third_line_x='3/28', third_line_col=6,
               third_line_ls='-.', third_line_x_l='Additional actions',
               fourth_line_x='4/11', fourth_line_col=6,
               fourth_line_ls='-', fourth_line_x_l='Tighter border control',
               fifth_line_x='4/15', fifth_line_col=8,
               fifth_line_ls='-.', fifth_line_x_l='Uusima border opened',
               sixth_line_x='5/14', sixth_line_col=8,
               sixth_line_ls='--', sixth_line_x_l='More releasing misures',
               seventh_line_x='6/1', seventh_line_col=8,
               seventh_line_ls=':', seventh_line_x_l='Further releasing',
               eighth_line_x='6/15', eighth_line_col=8,
               eighth_line_ls='-', eighth_line_x_l='End of state of emergency')
In [89]:
# Plotting new daily deceased cases in Finland
cust_bar_plot((days_fin, finland_deceas_incr_6, 3, ""),
               figsize_w=18, figsize_h=12,
               title="Coronavirus COVID-19 new daily deceased cases in Finland",
               title_fs=18, title_offset=20,
               rem_borders=True,
               label_fs=12, tick_fs=10, 
               x_label="month/day",
               rot=90,
               y_label=None,
               legend=False,
               leg_fs=12,
               legend_loc=0)

7.3. Comparison with the Closest Neighboring Countries

Sweden and Russia have a much higher number of cumulative confirmed cases per capita compared to Finland. Also, they have a much higher number of new daily confirmed cases.

The number of confirmed cases per capita considering the all world is similar to Finland but it is on an increasing path whereas Finnish curve tends to remain constant.

Also, Sweden has currently a much higher number of cumulative deaths per capita compared to Finland. Therefore, at least for Sweden, it is unlikely that the comparison is biased by a different testing policy.

In [90]:
# Comparing Finnish per capita cumulative confirmed cases with Sweden and Russia
cust_line_plot((days_tot, finland_conf_0_perc, ".", '-', 0, "Finland"),
               (days_tot, sweden_conf_0_perc, ".", '-', 1, "Sweden"),
               (days_tot, norway_conf_0_perc, ".", '-', 6, "Norway"),
               (days_tot, russia_conf_0_perc, ".", '-', 3, "Russia"),
               (days_tot, estonia_conf_0_perc, ".", '-', 7, "Estonia"),
               (days_tot, world_conf_perc, ".", '-', 4, "World Total"),
               figsize_w=18, figsize_h=12,
               title="COVID-19 per capita cumulative confirmed cases "\
                     "in Finland compared to the closest neighboring Countries",
               title_fs=18, title_offset=20,
               rem_borders=True,
               label_fs=12, tick_fs=10, 
               x_label="month/day",
               rot=90,
               y_label=None,
               legend=True,
               leg_fs=12,
               legend_loc=0)
In [91]:
# Comparing Finnish per capita cumulative deceased cases with Sweden and Russia
cust_line_plot((days_tot, finland_deceas_0_perc, ".", '-', 0, "Finland"),
               (days_tot, sweden_deceas_0_perc, ".", '-', 1, "Sweden"),
               (days_tot, norway_deceas_0_perc, ".", '-', 6, "Norway"),
               (days_tot, russia_deceas_0_perc, ".", '-', 3, "Russia"),
               (days_tot, estonia_deceas_0_perc, ".", '-', 7, "Estonia"),
               (days_tot, world_deceas_perc, ".", '-', 4, "World Total"),
               figsize_w=18, figsize_h=12,
               title="COVID-19 per capita cumulative deceased cases "\
                     "in Finland compared to the closest neighboring Countries",
               title_fs=18, title_offset=20,
               rem_borders=True,
               label_fs=12, tick_fs=10, 
               x_label="month/day",
               rot=90,
               y_label=None,
               legend=True,
               leg_fs=12,
               legend_loc=0)
In [92]:
# Comparing Finnish per capita daily confirmed cases with the closest neighboring Countries

finland_conf_incr_0_perc = pop_perc(finland_conf_incr_0, finland_pop)
sweden_conf_incr_0_perc = pop_perc(sweden_conf_incr_0, sweden_pop)
norway_conf_incr_0_perc = pop_perc(norway_conf_incr_0, norway_pop)
russia_conf_incr_0_perc = pop_perc(russia_conf_incr_0, russia_pop)
estonia_conf_incr_0_perc = pop_perc(estonia_conf_incr_0, estonia_pop)

plot_stacked_bar(days_tot,
                 [finland_conf_incr_0_perc, sweden_conf_incr_0_perc, norway_conf_incr_0_perc,
                  russia_conf_incr_0_perc, estonia_conf_incr_0_perc],
                 ["Finland", "Sweden", "Norway", "Russia", "Estonia"],
                 col=[0, 1, 6, 3, 7],
                 multidim=True, figsize_w=18, figsize_h=12,
                 title="COVID-19 per capita daily confirmed cases in "\
                       "Finland compared to the closest neighboring Countries",
                 title_fs=18,
                 frame=False,
                 category_labels=days_tot,
                 label_fs = 12, ticks_fs=10, 
                 x_label="month/day", rot=90,
                 y_label="Total of cases in all the Countries",
                 legend=True, legend_loc = 2, legend_fs=12,
                 add_text=None, addtext_x=0, addtext_y=0, addtext_fs=10)

7.3.1. Comparison with Other Scandinavian Countries and Estonia

Description of the plots of this section

It appears that the Finnish curve is quite smooth compared to the other curves. Only Iceland and Estonia have a smoother curve. This would suggest that the virus is not spreading faster in Finland compared to most of the other Scandinavian Countries. By shifting all the curves so that they start for each Country in the day of the first confirmed case, the Finnish curve is the slowest to grow but then crosses the curves of Iceland and Estonia.

Even though the virus started later in Finland, the first recovered case happened much earlier than other Scandinavian Countries.

Finland has the lowest number of deceased cases after Norway and Estonia (Sweden has the highest).

The high numbers for Sweden do not surprise due to the quite relaxed containment policy in the Country.

NOTE: It should be noted that the testing policy in each Country affects considerably the way the curve looks like. The less people you test, the better the curve looks like.

NOTE: The data from Denmark does not include Faroe Islands and Greenland.

In [93]:
# Comparing cumulative confirmed cases over time in Scandinavia plus Estonia
cust_line_plot((days_fin, finland_conf_6, ".", '-', 0, "Finland"),
               (days_fin, denmark_conf_6, ".", '-', 3, "Denmark"),
               (days_fin, norway_conf_6, ".", '-', 6, "Norway"),
               (days_fin, sweden_conf_6, ".", '-', 8, "Sweden"),
               (days_fin, iceland_conf_6, ".", '-', 4, "Iceland"),
               (days_fin, estonia_conf_6, ".", '-', 7, "Estonia"),
               figsize_w=18, figsize_h=12,
               title="Coronavirus COVID-19 cumulative confirmed cases in "\
                     "Scandinavia and Estonia over time",
               title_fs=18, title_offset=20,
               rem_borders=True,
               label_fs=12, tick_fs=10, 
               x_label="month/day",
               rot=90,
               y_label=None,
               legend=True,
               leg_fs=12,
               legend_loc=0)
In [94]:
# Comparing cumulative confirmed cases over time in Scandinavia plus Estonia
# starting form the day of the first confirmed case in Finland
cust_line_plot((list(range(len(finland_conf_pos))), finland_conf_pos,
                ".", '-', 0, "Finland"),
               (list(range(len(denmark_conf_pos))), denmark_conf_pos,
                ".", '-', 3, "Denmark"),
               (list(range(len(norway_conf_pos))), norway_conf_pos,
                ".", '-', 6, "Norway"),
               (list(range(len(sweden_conf_pos))), sweden_conf_pos,
                ".", '-', 8, "Sweden"),
               (list(range(len(iceland_conf_pos))), iceland_conf_pos,
                ".", '-', 4, "Iceland"),
               (list(range(len(estonia_conf_pos))), estonia_conf_pos,
                ".", '-', 7, "Estonia"),
               figsize_w=18, figsize_h=12,
               title="Coronavirus COVID-19 cumulative confirmed cases "\
                     "in Scandinavia and Estonia over time",
               title_fs=18, title_offset=20,
               rem_borders=True, 
               label_fs=12, tick_fs=10, 
               x_label="Days since the first confirmed case in the Country",
               rot=0,
               y_label=None,
               legend=True,
               leg_fs=12,
               legend_loc=0)
In [95]:
# Comparing new daily confirmed Coronavirus cases in Scandinavia plus Estonia
cust_line_plot((days_fin, finland_conf_incr_6, ".", '-', 0, "Finland"),
               (days_fin, denmark_conf_incr_6, ".", '-', 3, "Denmark"),
               (days_fin, norway_conf_incr_6, ".", '-', 6, "Norway"),
               (days_fin, sweden_conf_incr_6, ".", '-', 8, "Sweden"),
               (days_fin, iceland_conf_incr_6, ".", '-', 4, "Iceland"),
               (days_fin, estonia_conf_incr_6, ".", '-', 7, "Estonia"),
               figsize_w=18, figsize_h=12,
               title="Coronavirus COVID-19 new daily confirmed cases "\
                     "in Scandinavia and Estonia",
               title_fs=18, title_offset=20,
               rem_borders=True,
               label_fs=12, tick_fs=10, 
               x_label="month/day",
               rot=90,
               y_label=None,
               legend=True,
               leg_fs=12,
               legend_loc=0)

Comments to the next two plots:

Data related to Iceland is corrupted (cumulative data cannot decrease) so the related plot is not shown.

In [96]:
# Comparing cumulative recovered cases over time in Scandinavia plus Estonia
plot_stacked_bar(days_fin,
                 [finland_recov_6, denmark_recov_6, norway_recov_6, sweden_recov_6, estonia_recov_6],
                 ["Finland", "Denmark", "Norway", "Sweden", "Estonia"],
                 col=[0, 3, 6, 8, 7],
                 multidim=True, figsize_w=18, figsize_h=12,
                 title="Coronavirus COVID-19 cumulative (reported) recovered cases in "\
                       "Scandinavia and Estonia over time",
                 title_fs=18,
                 frame=False,
                 category_labels=days_tot,
                 label_fs = 12, ticks_fs=10, 
                 x_label="month/day", rot=90,
                 y_label="Total of cases in all the Countries",
                 legend=True, legend_loc = 2, legend_fs=12,
                 add_text=None, addtext_x=0, addtext_y=0, addtext_fs=10)
In [97]:
# Comparing cumulative deceased cases over time in Scandinavia plus Estonia
plot_stacked_bar(days_fin,
                 [finland_deceas_6, denmark_deceas_6, norway_deceas_6, sweden_deceas_6, estonia_deceas_6],
                 ["Finland", "Denmark", "Norway", "Sweden", "Estonia"],
                 col=[0, 3, 6, 8, 7],
                 multidim=True, figsize_w=18, figsize_h=12,
                 title="Coronavirus COVID-19 cumulative (reported) deceased cases in "\
                       "Scandinavia and Estonia over time",
                 title_fs=18,
                 frame=False,
                 category_labels=days_tot,
                 label_fs = 12, ticks_fs=10, 
                 x_label="month/day", rot=90,
                 y_label="Total of cases in all the Countries",
                 legend=True, legend_loc = 2, legend_fs=12,
                 add_text=None, addtext_x=0, addtext_y=0, addtext_fs=10)

7.4. Comparison with other European Countries

Finland has also the lowest curves compared to other European Countries (except for Luxemburg). However, it shall be noted that those are absolute values which are not normalized by taking into consideration the Country population.

The plots related to the new confirmed cases show the same pattern for all those Countries (except for Poland). This might be due to the fact that those plots are very much dependent on how many people are tested in a certain day.

Switzerland has managed to keep a relatively low curve. France has experienced a noticeable increase in the recorded confirmed cases around 4/11. Germany has managed to keep a low curve of the deceased cases despite the relatively high curve of the confirmed cases.

NOTE: When comparing those curves please note also that the testing policy in each Country affects considerably the way the curve looks like. The less people you test, the better the curve looks like.

NOTE: The data from France and Netherlands does not include offshore territories.

NOTE: Obviously, the following data is wrong since the cumulative data cannot decrease (leading to a negative daily increment):

In [98]:
print("Error data in confirmed cases in Spain:")
find_error_days(spain_conf_0)
print("Error data in confirmed cases in France:")
find_error_days(france_conf_0)
print("Error data in confirmed cases in Portugal:")
find_error_days(portugal_conf_0)
print("Error data in deceased cases in Spain:")
find_error_days(spain_deceas_0)
print("Error data in deceased cases in France:")
find_error_days(france_deceas_0)
Error data in confirmed cases in Spain:
['4/24', '5/25']
Error data in confirmed cases in France:
['4/18', '4/22', '4/29', '5/13', '5/16', '5/24', '5/26', '6/2', '6/24', '6/25', '6/27', '6/28', '7/4', '7/5', '7/8', '7/11', '7/12', '7/14', '7/18', '7/19']
Error data in confirmed cases in Portugal:
['5/2']
Error data in deceased cases in Spain:
['5/25']
Error data in deceased cases in France:
['5/16', '5/19', '6/27', '7/8', '7/11', '7/14', '7/18', '7/21']
In [99]:
# Comparing cumulative confirmed cases over time for Finland,
# Italy, Spain, Germany, France, Switzerland, Belgium, Netherlands and Portugal
cust_line_plot((days_tot, finland_conf_0, ".", '-', 0, "Finland"),
               (days_tot, italy_conf_0, ".", '-', 2, "Italy"),
               (days_tot, spain_conf_0, ".", '-', 1, "Spain"),
               (days_tot, germany_conf_0, ".", '-', 4, "Germany"),
               (days_tot, france_conf_0, ".", '-', 3, "France"),              
               (days_tot, switzerland_conf_0, ".", '-', 6, "Switzerland"),
               (days_tot, belgium_conf_0, ".", '-', 8, "Belgium"),
               (days_tot, netherlands_conf_0, ".", '-', 7, "Netherlands"),
               (days_tot, portugal_conf_0, ".", '-', 5, "Portugal"),
               figsize_w=18, figsize_h=12,
               title="Coronavirus COVID-19 cumulative confirmed cases "\
                     "in Finland compared to \nItaly, Spain, Germany, France "\
                     "Switzerland, Belgium, Netherlands and Portugal",
               title_fs=18, title_offset=20,
               rem_borders=True,
               label_fs=12, tick_fs=10, 
               x_label="month/day",
               rot=90,
               y_label=None,
               legend=True,
               leg_fs=12,
               legend_loc=0)
In [100]:
# Comparing cumulative deceased cases over time for Finland,
# Italy, Spain, Germany, France, Switzerland, Belgium, Netherlands and Portugal
cust_line_plot((days_tot, finland_deceas_0, ".", '-', 0, "Finland"),
               (days_tot, italy_deceas_0, ".", '-', 2, "Italy"),
               (days_tot, spain_deceas_0, ".", '-', 1, "Spain"),
               (days_tot, germany_deceas_0, ".", '-', 4, "Germany"),
               (days_tot, france_deceas_0, ".", '-', 3, "France"),              
               (days_tot, switzerland_deceas_0, ".", '-', 6, "Switzerland"),
               (days_tot, belgium_deceas_0, ".", '-', 8, "Belgium"),
               (days_tot, netherlands_deceas_0, ".", '-', 7, "Netherlands"),
               (days_tot, portugal_deceas_0, ".", '-', 5, "Portugal"),
               figsize_w=18, figsize_h=12,
               title="Coronavirus COVID-19 cumulative deceased cases "\
                     "in Finland compared to \nItaly, Spain, Germany, France "\
                     "Switzerland, Belgium and Netherlands and Portugal",
               title_fs=18, title_offset=20,
               rem_borders=True,
               label_fs=12, tick_fs=10, 
               x_label="month/day",
               rot=90,
               y_label=None,
               legend=True,
               leg_fs=12,
               legend_loc=0)
In [101]:
# Comparing cumulative confirmed cases over time for Finland,
# Austria, Luxembourg, Ireland and Poland
cust_line_plot((days_tot, finland_conf_0, ".", '-', 0, "Finland"),               
               (days_tot, austria_conf_0, ".", '-', 3, "Austria"),
               (days_tot, luxembourg_conf_0, ".", '-', 9, "Luxembourg"),
               (days_tot, ireland_conf_0, ".", '-', 1, "Ireland"),
               (days_tot, poland_conf_0, ".", '-', 6, "Poland"),
               figsize_w=18, figsize_h=12,
               title="Coronavirus COVID-19 cumulative confirmed cases "\
                     "in Finland compared to \nAustria, "\
                     "Luxembourg, Ireland and Poland",
               title_fs=18, title_offset=20,
               rem_borders=True,
               label_fs=12, tick_fs=10, 
               x_label="month/day",
               rot=90,
               y_label=None,
               legend=True,
               leg_fs=12,
               legend_loc=0)
In [102]:
# Comparing cumulative deceased cases over time for Finland,
# Austria, Luxembourg, Ireland and Poland
cust_line_plot((days_tot, finland_deceas_0, ".", '-', 0, "Finland"),               
               (days_tot, austria_deceas_0, ".", '-', 3, "Austria"),
               (days_tot, luxembourg_deceas_0, ".", '-', 9, "Luxembourg"),
               (days_tot, ireland_deceas_0, ".", '-', 1, "Ireland"),
               (days_tot, poland_deceas_0, ".", '-', 6, "Poland"),
               figsize_w=18, figsize_h=12,
               title="Coronavirus COVID-19 cumulative deceased cases "\
                     "in Finland compared to \nAustria, "\
                     "Luxembourg, Ireland and Poland",
               title_fs=18, title_offset=20,
               rem_borders=True,
               label_fs=12, tick_fs=10, 
               x_label="month/day",
               rot=90,
               y_label=None,
               legend=True,
               leg_fs=12,
               legend_loc=0)

7.5. Situation in China

Two sets of plots are shown here: one for the Hubei province where the infection has started and the other one for the rest of China.

The first plot of the two sets shows the cumulative confirmed cases broken down by deceased, recovered and active cases. Whereas in Hubei there has not been yet a second wave, that is the case for the rest of China.

The second plot shows separately the cumulative curves for the confirmed, recovered, deceased and active cases. The curve for the rest of China has been analyzed in details in section 7.1.2.

Note: There is something wrong in the source data for Hubei province on 4/17 since the cumulative recovered cases cannot decrease over time. Also, the incremental data (increment in confirmed cases) for Hubei province from 2/12 has been reported on 2/13.

In [103]:
print("Error data in confirmed cases in Hubei:")
find_error_days(hubei_recov_0)
Error data in confirmed cases in Hubei:
['4/17']
In [104]:
# Plotting daily cumulative cases in Hubei
plot_stacked_bar(days_tot,
                 [hubei_deceas_0, hubei_recov_0, hubei_act_0],
                 ["deceased cases", "recovered cases", "active cases"],
                 col=[3, 2, 1],
                 multidim=True, figsize_w=18, figsize_h=12,
                 title="COVID-19 cumulative cases in Hubei (China) over time",
                 title_fs=18,
                 frame=False,
                 category_labels=days_tot,
                 label_fs = 12, ticks_fs=10, 
                 x_label="month/day", rot=90,
                 y_label="confirmed cases",
                 legend=True, legend_loc = 2, legend_fs=12,
                 add_text=None, addtext_x=0, addtext_y=0, addtext_fs=10)
In [105]:
# Plotting daily cumulative cases in Hubei
cust_line_plot((days_tot, hubei_conf_0, ".", '-', 0, "confirmed cases"),
               (days_tot, hubei_recov_0, ".", '-', 2, "recovered cases"),
               (days_tot, hubei_deceas_0, ".", '-', 3, "deceased cases"),
               (days_tot, hubei_act_0, ".", '-', 1, "active cases"),
               figsize_w=18, figsize_h=12,
               title="Coronavirus COVID-19 cumulative cases in Hubei (China) over time",
               title_fs=18, title_offset=20,
               rem_borders=True,
               label_fs=12, tick_fs=10, 
               x_label="month/day",
               rot=90,
               y_label=None,
               legend=True,
               leg_fs=12,
               legend_loc=0)
In [106]:
# Plotting daily increments in confirmed cases in Hubei province in China
cust_bar_plot((days_tot, hubei_conf_incr_0, 0, ""),
               figsize_w=18, figsize_h=12,
               title="Coronavirus COVID-19 new daily confirmed cases "\
                     "in Hubei province (China)",
               title_fs=18, title_offset=20,
               rem_borders=True,
               label_fs=12, tick_fs=10, 
               x_label="month/day",
               rot=90,
               y_label=None,
               legend=False,
               leg_fs=12,
               legend_loc=0)
In [107]:
# Plotting daily cumulative cases in the rest of China
plot_stacked_bar(days_tot,
                 [restchina_deceas_0, restchina_recov_0, restchina_act_0],
                 ["deceased cases", "recovered cases", "active cases"],
                 col=[3, 2, 1],
                 multidim=True, figsize_w=18, figsize_h=12,
                 title="COVID-19 cumulative cases in China either than Hubei over time",
                 title_fs=18,
                 frame=False,
                 category_labels=days_tot,
                 label_fs = 12, ticks_fs=10, 
                 x_label="month/day", rot=90,
                 y_label="confirmed cases",
                 legend=True, legend_loc = 2, legend_fs=12,
                 add_text=None, addtext_x=0, addtext_y=0, addtext_fs=10)
In [108]:
# Plotting daily cumulative cases in the rest of China
cust_line_plot((days_tot, restchina_conf_0, ".", '-', 0, "confirmed cases"),
               (days_tot, restchina_recov_0, ".", '-', 2, "recovered cases"),
               (days_tot, restchina_deceas_0, ".", '-', 3, "deceased cases"),
               (days_tot, restchina_act_0, ".", '-', 1, "active cases"),
               figsize_w=18, figsize_h=12,
               title="Coronavirus COVID-19 cumulative cases in China "\
                     "either than Hubei over time",
               title_fs=18, title_offset=20,
               rem_borders=True,
               label_fs=12, tick_fs=10, 
               x_label="month/day",
               rot=90,
               y_label=None,
               legend=True,
               leg_fs=12,
               legend_loc=0)
In [109]:
# Plotting daily increments in confirmed cases in the rest of China
cust_bar_plot((days_tot, restchina_conf_incr_0, 0, ""),
               figsize_w=18, figsize_h=12,
               title="Coronavirus COVID-19 new daily confirmed cases in China "\
                     "either than Hubei",
               title_fs=18, title_offset=20,
               rem_borders=True,
               label_fs=12, tick_fs=10, 
               x_label="month/day",
               rot=90,
               y_label=None,
               legend=False,
               leg_fs=12,
               legend_loc=0)

7.6. Situation in Italy

Italy has been the first Country after China (and the first Country in Europe) that has been hit hard from the virus and its government, as opposite to Finland, keeps very comprehensive public data. Therefore, analysis of Italian curves might be useful also to have some hints about Finnish situation.

Italy has had a very confusing strategy and decision-making process in the beginning of the epidemic and this has been probably one of the causes of the high number of cases. However, after an initial period of very poor handling of the situation, quite strict containment measures have been decided and this has led to curves whose shape that are quite close to the curves from China with the main difference that the slowed down phase has been smoother. So, there has been a exponential increase in the number of confirmed cases, followed by a short linear phase and a quite long slowed down phase, which is still ongoing.

Note:

  • Data from 3/12 has been reported on 3/13.
  • The confirmed cases on 6/19 are wrong since the incremental value cannot be negative.
  • The data about daily deceased cases on 6/24 is wrong since the incremental value cannot be negative.
In [110]:
# Plotting daily cumulative cases in Italy
cust_line_plot((days_tot, italy_conf_0, ".", '-', 0, "confirmed cases"),
               (days_tot, italy_recov_0, ".", '-', 2, "recovered cases"),
               (days_tot, italy_deceas_0, ".", '-', 3, "deceased cases"),
               (days_tot, italy_act_0, ".", '-', 1, "active cases"),
               figsize_w=18, figsize_h=12,
               title="Coronavirus COVID-19 cumulative cases in Italy over time",
               title_fs=18, title_offset=20,
               rem_borders=True,
               label_fs=12, tick_fs=10, 
               x_label="month/day",
               rot=90,
               y_label=None,
               legend=True,
               leg_fs=12,
               legend_loc=0)
In [111]:
# Plotting daily cumulative cases in Italy
plot_stacked_bar(days_tot,
                 [italy_deceas_0, italy_recov_0, italy_act_0],
                 ["deceased cases", "recovered cases", "active cases"],
                 col=[3, 2, 1],
                 multidim=True, figsize_w=18, figsize_h=12,
                 title="COVID-19 cumulative cases in Italy over time",
                 title_fs=18,
                 frame=False,
                 category_labels=days_tot,
                 label_fs = 12, ticks_fs=10, 
                 x_label="month/day", rot=90,
                 y_label="confirmed cases",
                 legend=True, legend_loc = 2, legend_fs=12,
                 add_text=None, addtext_x=0, addtext_y=0, addtext_fs=10)
In [112]:
# Plotting new daily confirmed Coronavirus cases in Italy
cust_bar_plot((days_tot, italy_conf_incr_0, 0, ""),
               figsize_w=18, figsize_h=12,
               title="Coronavirus COVID-19 new daily confirmed cases in Italy",
               title_fs=18, title_offset=20,
               rem_borders=True,
               label_fs=12, tick_fs=10, 
               x_label="month/day",
               rot=90,
               y_label=None,
               legend=False,
               leg_fs=12,
               legend_loc=0)
In [113]:
# Plotting increments in the active cases in Italy
cust_bar_plot((days_tot, calc_increments(italy_act_0), 1, ""),
               figsize_w=18, figsize_h=12,
               title="Coronavirus COVID-19 increments in the active cases in Italy",
               title_fs=18, title_offset=20,
               rem_borders=True,
               label_fs=12, tick_fs=10, 
               x_label="month/day",
               rot=90,
               y_label=None,
               legend=False,
               leg_fs=12,
               legend_loc=0)
In [114]:
# Plotting new daily deceased cases in Italy
cust_bar_plot((days_tot, italy_deceas_incr_0, 3, ""),
               figsize_w=18, figsize_h=12,
               title="Coronavirus COVID-19 new daily deceased cases in Italy",
               title_fs=18, title_offset=20,
               rem_borders=True,
               label_fs=12, tick_fs=10, 
               x_label="month/day",
               rot=90,
               y_label=None,
               legend=False,
               leg_fs=12,
               legend_loc=0)

7.7. UK and US

UK and US have followed quite relaxed policies in containing the spread of the virus during the first days.

NOTE: The data from UK does not include the Isle of Man and the Channel Islands.

NOTE: The following data is wrong since the cumulative data cannot decrease (leading to a negative daily increment):

In [115]:
print("Error data in confirmed cases in UK:")
find_error_days(uk_conf_0)
Error data in confirmed cases in UK:
[]
In [116]:
# Comparing cumulative confirmed Coronavirus cases in UK and US
cust_line_plot(#(days_tot, finland_conf_0, ".", '-', 0, "Finland"),
               (days_tot, uk_conf_0, ".", '-', 4, "UK"),
               (days_tot, us_conf_0, ".", '-', 3, "US"),
               figsize_w=18, figsize_h=12,
               title="Coronavirus COVID-19 cumulative confirmed cases "\
                     "in UK and US",
               title_fs=18, title_offset=20,
               rem_borders=True,
               label_fs=12, tick_fs=10, 
               x_label="month/day",
               rot=90,
               y_label=None,
               legend=True,
               leg_fs=12,
               legend_loc=0)
In [117]:
# Comparing cumulative deceased Coronavirus cases in UK and US
cust_line_plot(#(days_tot, finland_deceas_0, ".", '-', 0, "Finland"),
               (days_tot, uk_deceas_0, ".", '-', 4, "UK"),
               (days_tot, us_deceas_0, ".", '-', 3, "US"),
               figsize_w=18, figsize_h=12,
               title="Coronavirus COVID-19 cumulative deceased cases "\
                     "in UK and US",
               title_fs=18, title_offset=20,
               rem_borders=True,
               label_fs=12, tick_fs=10, 
               x_label="month/day",
               rot=90,
               y_label=None,
               legend=True,
               leg_fs=12,
               legend_loc=0)
In [118]:
# Comparing new daily confirmed cases Coronavirus cases in UK and US
cust_line_plot(#(days_tot, finland_conf_incr_0, ".", '-', 0, "Finland"),
               (days_tot, uk_conf_incr_0, ".", '-', 4, "UK"),
               (days_tot, us_conf_incr_0, ".", '-', 3, "US"),
               figsize_w=18, figsize_h=12,
               title="Coronavirus COVID-19 new daily confirmed cases "\
                     "in UK and US",
               title_fs=18, title_offset=20,
               rem_borders=True,
               label_fs=12, tick_fs=10, 
               x_label="month/day",
               rot=90,
               y_label=None,
               legend=True,
               leg_fs=12,
               legend_loc=0)

7.8. Brazil, Russia and India

Whereas in the first half of the year the virus has hit mostly China, Europe and the US, by the end of winter the number of active cases in China was down to very low values and the same has happened in most of Europe by the end of spring.

However, in other parts of the world, like Russia, India and Brazil the curves are still in a growing phase at the beginning of summer.

The following chart shows the cumulative confirmed cases in those 3 Countries. For reason of scale, Finnish curve would not bi visible in the same chart so Italy curve has been added to show a comparison with the number of cases over time in one of the most hit Countries in Europe.

The second chart shows a similar plot for the deceased cases (which are less sensible to the Country specific testing policy).

While Brazil and Russia have entered the linear growing phase, India is still in the exponential growing phase.

In [119]:
# Plotting cumulative confirmed cases over time for Brazil, Russia and India
# compared to Italy
cust_line_plot((days_tot, brazil_conf_0, ".", '-', 2, "Brazil"),
               (days_tot, russia_conf_0, ".", '-', 0, "Russia"),
               (days_tot, india_conf_0, ".", '-', 1, "India"),
               (days_tot, italy_conf_0, ".", '-', 3, "Italy"),
               figsize_w=18, figsize_h=12,
               title="Coronavirus COVID-19 cumulative confirmed cases "\
                     "in Brazil, Russia and India",
               title_fs=18, title_offset=20,
               rem_borders=True,
               label_fs=12, tick_fs=10, 
               x_label="month/day",
               rot=90,
               y_label=None,
               legend=True,
               leg_fs=12,
               legend_loc=0)
In [120]:
# Plotting cumulative deceased cases over time for Brazil, Russia and India
# compared to Italy
cust_line_plot((days_tot, brazil_deceas_0, ".", '-', 2, "Brazil"),
               (days_tot, russia_deceas_0, ".", '-', 0, "Russia"),
               (days_tot, india_deceas_0, ".", '-', 1, "India"),
               (days_tot, italy_deceas_0, ".", '-', 3, "Italy"),
               figsize_w=18, figsize_h=12,
               title="Coronavirus COVID-19 cumulative deceased cases "\
                     "in Brazil, Russia and India",
               title_fs=18, title_offset=20,
               rem_borders=True,
               label_fs=12, tick_fs=10, 
               x_label="month/day",
               rot=90,
               y_label=None,
               legend=True,
               leg_fs=12,
               legend_loc=0)

7.9. Normalizing by Country population

7.9.1. List of Variables Affecting Potentially the Curves

The curves related to the cumulative confirmed cases seem to have similar shape. The main difference seems to be the height.

The height of those curves can differ for different reasons, including:

  • the Country overall population (obviously the more people are in the Country, the more people can get infected)
  • the population density (higher is the population density, easier it might be for the virus to spread)
  • demographics (older is the population, easier is for the virus to kill)
  • average health conditions of the population (healthier is the population, harder is for the virus to kill)
  • genetics ?
  • climate (the virus might have more difficulty to survive in too cold or too hot weather)
  • pollution (there are preliminary indications that pollution might facilitate the spread of the virus)
  • possible mutations of the virus in that area
  • the testing policy in the Country (the more people a Country tests, the more infected cases might be discovered)
  • which containment measures have been taken by Authorities and how early
  • how well the population has followed the containment measures
  • whether the Country is in a central area and whether there is a lot of movement of people
  • last but not least, the stage in which the Country is (the curves follow all the same smooth-steep-smooth shape so Countries where the virus has just started to spread show lower curves)

It might be interesting to isolate the first variable, Country population, by dividing the values by the Country population in order to calculate the amount of cases per capita. The result is shown in plots in this section.

NOTE: The Country population figures are approximative.

7.9.2. Confirmed Cases: Summary of Findings from the Analysis

The plots show that the other variables still can affect the curve as much as 10 times..

When comparing Scandinavian Countries and Estonia, Finland has the lowest number of confirmed cases per capita. Iceland has the highest number.

Among the analyzed European Countries, Luxemburg has the highest confirmed cases curve, followed by Spain and Belgium (which have values that are comparable with Iceland). Poland is the only Country among those ones that have been analyzed, that has a confirmed cases curve lower than Finland.

Note that one of the reasons why UK and Finland curves started pretty low might be due to the fact that they are quite isolated geographically and therefore the virus started to spread later.

However, those curves clearly show that in Countries that have not taken prompt containment actions, such as UK, US and Sweden, those curves started to take a steeper shape.

In [121]:
# Comparing cumulative confirmed cases over time for Finland,
# and other Scandinavian Countries plus Estonia in percentage of the Country population
cust_line_plot((days_tot, finland_conf_0_perc, ".", '-', 0, "Finland"),
               (days_tot, denmark_conf_0_perc, ".", '-', 3, "Denmark"),
               (days_tot, norway_conf_0_perc, ".", '-', 6, "Norway"),
               (days_tot, sweden_conf_0_perc, ".", '-', 8, "Sweden"),
               (days_tot, iceland_conf_0_perc, ".", '-', 4, "Iceland"),
               (days_tot, estonia_conf_0_perc, ".", '-', 7, "Estonia"),
               figsize_w=18, figsize_h=12,
               title="Coronavirus COVID-19 cumulative confirmed cases "\
                     "in Finland and other Scandinavian Countries plus Estonia \n"\
                     "in percentage of each Country population",
               title_fs=18, title_offset=20,
               rem_borders=True,
               label_fs=12, tick_fs=10, 
               x_label="month/day",
               rot=90,
               y_label=None,
               legend=True,
               leg_fs=12,
               legend_loc=0)
In [122]:
# Comparing cumulative confirmed cases over time for Finland,
# and other European Countries in percentage of the Country population
cust_line_plot((days_tot, finland_conf_0_perc, ".", '-', 0, "Finland"),
               (days_tot, italy_conf_0_perc, ".", '-', 2, "Italy"),
               (days_tot, spain_conf_0_perc, ".", '-', 1, "Spain"),
               (days_tot, germany_conf_0_perc, ".", '-', 4, "Germany"),
               (days_tot, france_conf_0_perc, ".", '-', 3, "France"),
               (days_tot, switzerland_conf_0_perc, ".", '-', 6, "Switzerland"),
               (days_tot, luxembourg_conf_0_perc, ".", '-', 9, "Luxembourg"),
               (days_tot, belgium_conf_0_perc, ".", '-', 8, "Belgium"),
               (days_tot, ireland_conf_0_perc, ".", '-', 7, "Ireland"),
               figsize_w=18, figsize_h=12,
               title="Coronavirus COVID-19 cumulative confirmed cases "\
                     "in Finland and other European Countries \n"\
                     "in percentage of each Country population",
               title_fs=18, title_offset=20,
               rem_borders=True,
               label_fs=12, tick_fs=10, 
               x_label="month/day",
               rot=90,
               y_label=None,
               legend=True,
               leg_fs=12,
               legend_loc=0)
In [123]:
# Comparing cumulative confirmed cases over time for Finland,
# and other European Countries + UK & US in percentage of the Country population
cust_line_plot((days_tot, finland_conf_0_perc, ".", '-', 0, "Finland"),
               (days_tot, netherlands_conf_0_perc, ".", '-', 4, "Netherlands"),
               (days_tot, austria_conf_0_perc, ".", '-', 3, "Austria"),
               (days_tot, portugal_conf_0_perc, ".", '-', 2, "Portugal"),
               (days_tot, poland_conf_0_perc, ".", '-', 7, "Poland"),
               (days_tot, uk_conf_0_perc, ".", '-', 6, "UK"),
               (days_tot, us_conf_0_perc, ".", '-', 5, "US"),               
               figsize_w=18, figsize_h=12,
               title="Coronavirus COVID-19 cumulative confirmed cases "\
                     "in Finland and other European Countries + UK & US \n"\
                     "in percentage of each Country population",
               title_fs=18, title_offset=20,
               rem_borders=True,
               label_fs=12, tick_fs=10, 
               x_label="month/day",
               rot=90,
               y_label=None,
               legend=True,
               leg_fs=12,
               legend_loc=0)
In [124]:
# Plotting cumulative confirmed cases over time for Brazil, Russia and India
# compared to Italy in percentage of each Country population
cust_line_plot((days_tot, brazil_conf_0_perc, ".", '-', 2, "Brazil"),
               (days_tot, russia_conf_0_perc, ".", '-', 0, "Russia"),
               (days_tot, india_conf_0_perc, ".", '-', 1, "India"),
               (days_tot, italy_conf_0_perc, ".", '-', 3, "Italy"),
               figsize_w=18, figsize_h=12,
               title="Coronavirus COVID-19 cumulative confirmed cases "\
                     "in Brazil, Russia and India \n"\
                     "in percentage of each Country population",
               title_fs=18, title_offset=20,
               rem_borders=True,
               label_fs=12, tick_fs=10, 
               x_label="month/day",
               rot=90,
               y_label=None,
               legend=True,
               leg_fs=12,
               legend_loc=0)

7.9.3. Deceased Cases: Summary of Findings from the Analysis

In the attempt to eliminate the variability due to different testing policies in different Countries, similar plots have been created by taking the deceased cases rather than the cumulative confirmed cases as a reference curve.

Finland has the second lowest curve in Scandinavia, after Norway, and the third lowest if also Estonia is counted. Sweden has the higherst curve. (This is the same result that has been obtained before normalization).

Among the analyzed EU Countries, Belgium is the Country with the highest deceased cases curve, followed by Spain and Italy.

Poland has a deceased cases curve lower than Finland.

In [125]:
# Comparing cumulative deceased cases over time for Finland,
# and other Scandinavian Countries plus Estonia in percentage of the Country population
cust_line_plot((days_tot, finland_deceas_0_perc, ".", '-', 0, "Finland"),
               (days_tot, denmark_deceas_0_perc, ".", '-', 3, "Denmark"),
               (days_tot, norway_deceas_0_perc, ".", '-', 6, "Norway"),
               (days_tot, sweden_deceas_0_perc, ".", '-', 8, "Sweden"),
               (days_tot, estonia_deceas_0_perc, ".", '-', 7, "Estonia"),
               figsize_w=18, figsize_h=12,
               title="Coronavirus COVID-19 cumulative deceased cases "\
                     "in Finland and other Scandinavian Countries plus Estonia\n"\
                     "in percentage of each Country population",
               title_fs=18, title_offset=20,
               rem_borders=True,
               label_fs=12, tick_fs=10, 
               x_label="month/day",
               rot=90,
               y_label=None,
               legend=True,
               leg_fs=12,
               legend_loc=0)
In [126]:
# Comparing cumulative deceased cases over time for Finland,
# and other European Countries in percentage of the Country population
cust_line_plot((days_tot, finland_deceas_0_perc, ".", '-', 0, "Finland"),
               (days_tot, italy_deceas_0_perc, ".", '-', 2, "Italy"),
               (days_tot, spain_deceas_0_perc, ".", '-', 1, "Spain"),
               (days_tot, germany_deceas_0_perc, ".", '-', 4, "Germany"),
               (days_tot, france_deceas_0_perc, ".", '-', 3, "France"),
               (days_tot, switzerland_deceas_0_perc, ".", '-', 6, "Switzerland"),
               (days_tot, luxembourg_deceas_0_perc, ".", '-', 9, "Luxembourg"),
               (days_tot, belgium_deceas_0_perc, ".", '-', 8, "Belgium"),
               (days_tot, ireland_deceas_0_perc, ".", '-', 7, "Ireland"),
               figsize_w=18, figsize_h=12,
               title="Coronavirus COVID-19 cumulative deceased cases "\
                     "in Finland and other European Countries \n"\
                     "in percentage of each Country population",
               title_fs=18, title_offset=20,
               rem_borders=True,
               label_fs=12, tick_fs=10, 
               x_label="month/day",
               rot=90,
               y_label=None,
               legend=True,
               leg_fs=12,
               legend_loc=0)
In [127]:
# Comparing cumulative deceased cases over time for Finland,
# and other European Countries + UK & US in percentage of the Country population
cust_line_plot((days_tot, finland_deceas_0_perc, ".", '-', 0, "Finland"),
               (days_tot, netherlands_deceas_0_perc, ".", '-', 4, "Netherlands"),
               (days_tot, austria_deceas_0_perc, ".", '-', 3, "Austria"),
               (days_tot, portugal_deceas_0_perc, ".", '-', 2, "Portugal"),
               (days_tot, poland_deceas_0_perc, ".", '-', 7, "Poland"),
               (days_tot, uk_deceas_0_perc, ".", '-', 6, "UK"),
               (days_tot, us_deceas_0_perc, ".", '-', 5, "US"),               
               figsize_w=18, figsize_h=12,
               title="Coronavirus COVID-19 cumulative deceased cases "\
                     "in Finland and other European Countries + UK & US \n"\
                     "in percentage of each Country population",
               title_fs=18, title_offset=20,
               rem_borders=True,
               label_fs=12, tick_fs=10, 
               x_label="month/day",
               rot=90,
               y_label=None,
               legend=True,
               leg_fs=12,
               legend_loc=0)
In [128]:
# Plotting cumulative deceased cases over time for Brazil, Russia and India
# compared to Italy in percentage of each Country population
cust_line_plot((days_tot, brazil_deceas_0_perc, ".", '-', 2, "Brazil"),
               (days_tot, russia_deceas_0_perc, ".", '-', 0, "Russia"),
               (days_tot, india_deceas_0_perc, ".", '-', 1, "India"),
               (days_tot, italy_deceas_0_perc, ".", '-', 3, "Italy"),
               figsize_w=18, figsize_h=12,
               title="Coronavirus COVID-19 cumulative deceased cases "\
                     "in Brazil, Russia and India \n"\
                     "in percentage of each Country population",
               title_fs=18, title_offset=20,
               rem_borders=True,
               label_fs=12, tick_fs=10, 
               x_label="month/day",
               rot=90,
               y_label=None,
               legend=True,
               leg_fs=12,
               legend_loc=0)

7.10. Demographic Considerations

In [129]:
print("The range of the median age in the Countries that are analyzed here is: "\
      "{:.1f} years"\
     .format(median_age_range), "\n")
print("The range of the median age in the Scandinavian Countries that are analyzed here is: "\
      "{:.1f} years"\
     .format(scand_median_age_range), "\n")
print("The range of the median age in the EU Countries that are analyzed here is: "\
      "{:.1f} years"\
     .format(eu_median_age_range))
The range of the median age in the Countries that are analyzed here is: 18.9 years 

The range of the median age in the Scandinavian Countries that are analyzed here is: 6.0 years 

The range of the median age in the EU Countries that are analyzed here is: 9.2 years

7.11. World View

By looking the all world, the virus is still in the linear growing phase and there is no sign of slowing down.

In [130]:
# Plotting daily cumulative cases in the all world
cust_line_plot((days_tot, world_conf_tot, ".", '-', 0, "confirmed cases"),
               (days_tot, world_recov_tot, ".", '-', 2, "recovered cases"),
               (days_tot, world_deceas_tot, ".", '-', 3, "deceased cases"),
               (days_tot, world_act_tot, ".", '-', 1, "active cases"),
               figsize_w=18, figsize_h=12,
               title="Coronavirus COVID-19 cumulative cases in the all world "\
                     "over time",
               title_fs=18, title_offset=20,
               rem_borders=True,
               label_fs=12, tick_fs=10, 
               x_label="month/day",
               rot=90,
               y_label=None,
               legend=True,
               leg_fs=12,
               legend_loc=0)
In [131]:
# Plotting new daily cases in the all world
cust_bar_plot((days_tot, world_conf_incr, 0, ""),
              figsize_w=18, figsize_h=12,
              title="Coronavirus COVID-19 new daily confirmed cases in the all world",
              title_fs=18, title_offset=20,
              rem_borders=True,
               label_fs=12, tick_fs=10, 
               x_label="month/day",
               rot=90,
               y_label=None,
               legend=False,
               leg_fs=12,
               legend_loc=0)
In [132]:
# Plotting increments in the active cases in the all world
cust_bar_plot((days_tot, calc_increments(world_act_tot), 1, ""),
               figsize_w=18, figsize_h=12,
               title="Coronavirus COVID-19 increments in the active cases "\
                     "in the all world",
               title_fs=18, title_offset=20,
               rem_borders=True,
               label_fs=12, tick_fs=10, 
               x_label="month/day",
               rot=90,
               y_label=None,
               legend=False,
               leg_fs=12,
               legend_loc=0)

7.11.1. Lethality

The estimated average daily number of deaths due to other reasons has been added with the only scope of putting the numbers into context.

In this comparison the deaths by other reasons are estimated with a linear model, which is clearly an approximation since, for example, seasonal flu and suicides follows certain yearly patterns.

On 4/16 the number of reported deaths due to COVID-19 has overtaken the estimated number of deaths due to seasonal flu since the start of the year.

On 5/13 the number of reported deaths due to COVID-19 has overtaken the estimated number of deaths by suicide since the start of the year.

Currently, the number of deaths by COVID-19 grows somehow linearly at about 5000 deaths/day. Therefore, unless this growth will slow down, the number of estimated deaths due to other reasons (like for example road traffic accidents) might at a certain point become higher.

By assuming the number of COVID-19 reported deaths worldwide will stay constant at around 4000/day for the rest of the year, by the end of the year the number of deaths by COVID-19 worldwide would reach about 1.2 millions, whereas the number of deaths by seasonal flu is estimated to be around 470.000 (-/+ 38%). This means an overall COVID-19 mortality rate 2.6 times higher than seasonal flu and slightly lower than traffic road accidents (1.3 millions).

The following shall be noted:

  • The deaths by COVID-19 might be under estimated due to the fact that not all the population is tested
  • The average deaths by seasonal flu in year 2020 might be less than normal due to the high hand hygiene that has been introduced due to the novel Coronavirus. Similarly, the deaths due to road traffic accidents might be slight less than expected due to the reduced mobility of people due to containment measures
  • This comparison tells nothing about the IFR. In particular, it should be noted that, without the containment measures that have been adopted worldwide, the number of COVID-19 deaths would have been very likely considerably higher

Sources for the additional info:
- https://www.worldometers.info/
- https://www.who.int/mental_health/prevention/suicide/suicideprevent/en/
- https://www.who.int/mediacentre/events/meetings/2011/road_safety/en/
- https://www.who.int/news-room/fact-sheets/detail/tobacco

In [133]:
# Plotting new daily deceased cases in the all world
cust_bar_plot((days_tot, world_deceas_incr, 3, 
               "Daily (reported) deceased cases by COVID-19"),
              figsize_w=18, figsize_h=12,
              title="Coronavirus COVID-19 new daily deceased cases "\
                    "in the all world",
              title_fs=18, title_offset=20,
              rem_borders=True,
              label_fs=12, tick_fs=10, 
              x_label="month/day",
              rot=90,
              y_label=None,
              legend=True,
              leg_fs=12,
              legend_loc=0,
              first_line_y=1288,
              first_line_y_l="Average daily estimated deaths by seasonal flu",
              second_line_y=2192,
              second_line_y_l="Average daily estimated deaths by suicides",
              third_line_y=3561,
              third_line_y_l="Average daily estimated number of deaths "\
                             "by road traffic accidents",
              #fourth_line_y=19178,
              #fourth_line_y_l="Average daily estimated deaths by direct tobacco smoking"
             )
In [134]:
# Creating a series containing the number of deaths by different causes
# so far this year
deceas_causes = pd.Series([world_deceas_tot.iloc[-1],
                           1288*(len(days_tot)+21),
                           2192*(len(days_tot)+21),
                           3561*(len(days_tot)+21),
                           19178*(len(days_tot)+21)],
                          index=["Reported deaths by COVID-19",
                                 "Estimated deaths by seasonal flu",
                                 "Estimated deaths by suicides",
                                 "Estimated deaths by road traffic accidents",
                                 "Estimated deaths by direct tobacco smoking"])
In [135]:
# Showing the number of deaths by different causes so far this year in a bar plot
plot_cust_hbar(deceas_causes.sort_values(),
               figsize_w=8, figsize_h=6,
               frame=False, grid=False,
               ref_font_size=12,
               title_text="Number of deaths by different causes so far this year "\
                          "compared to COVID-19",
               title_offset=20,
               color_numb=3,
               categ_labels=True,
               labels=None,
               rot=0,
               show_values=True,
               omitted_value=0,
               percent=False,
               center_al=False,
               visible_digits=2)
In [136]:
# Estimating worldwide COVID-19 deaths by the end of the year
# (hypothesis: constant growth)
# (estimation date: May 27th)
est_flu_deaths_2020 = 1288*365
est_road_acc_deaths_2020 = 3561*365
est_COVID19_deaths_2020 = 350453+4000*218
print("Estimated deaths by COVID-19 by the end of the year: {}\n"\
      "(Hypothesis: constant growth).".\
      format(est_COVID19_deaths_2020))
print("Estimated deaths by seasonal flu by the end of the year: {}.".\
      format(est_flu_deaths_2020))
print("Estimated deaths by traffic road accidents by the end of the year: {}.".\
      format(est_road_acc_deaths_2020))
# Comparing the result with other causes of deaths
COVID_flu_ration = round(est_COVID19_deaths_2020/est_flu_deaths_2020, 2)
COVID_road_ration = round(est_COVID19_deaths_2020/est_road_acc_deaths_2020, 2)
print("\nBy assuming a constant increase in the number of COVID-19 deaths,\n"\
      "by the end of the year the number of COVID-19 deaths will be {} times \n"\
      "the number of estimated deaths by seasonal flu and {} times the number of \n"\
      "estimated deaths by traffic road accidents.".\
      format(COVID_flu_ration, COVID_road_ration))
Estimated deaths by COVID-19 by the end of the year: 1222453
(Hypothesis: constant growth).
Estimated deaths by seasonal flu by the end of the year: 470120.
Estimated deaths by traffic road accidents by the end of the year: 1299765.

By assuming a constant increase in the number of COVID-19 deaths,
by the end of the year the number of COVID-19 deaths will be 2.6 times 
the number of estimated deaths by seasonal flu and 0.94 times the number of 
estimated deaths by traffic road accidents.

8. Statistics

8.1. World View

In [137]:
# Reordering the columns
daily_rep_group = daily_rep_group.reindex(columns=['Confirmed',
                                                   'Recovered',
                                                   'Deaths',
                                                   'Active'])
In [138]:
print("Grand Total Worldwide:\n")
print(daily_rep_group.sum().to_string())
# Confirmed cases in percentage of the total population
cont_perc_world = daily_rep_group.sum()[0]/(7.8*1000000000)*100
print("\nConfirmed cases in percentage of the total population:")
print("{:.2f} %".format(cont_perc_world))
Grand Total Worldwide:

Confirmed    14947428
Recovered     8466990
Deaths         616443
Active        5863995

Confirmed cases in percentage of the total population:
0.19 %
In [139]:
# Mortality (worldwide)
mort = (daily_rep_group.sum()[2]/daily_rep_group.sum()[0])*100
print("'Calculated' mortality worldwide (Case Fatality Rate): {:.2f} %\n".format(mort))
print("IMPORTANT NOTE:\nThe actual mortality (Infection Fatality Rate) could be much lower",
      "due to the fact that\nnot all infected people have been tested!\n"
      "On the other hand, the counted deaths are due to infections that happened",
      "weeks ago.\nThis means that, as long as the contagius cases increase, "
      "the calculated mortality\nis under-estimated.")
'Calculated' mortality worldwide (Case Fatality Rate): 4.12 %

IMPORTANT NOTE:
The actual mortality (Infection Fatality Rate) could be much lower due to the fact that
not all infected people have been tested!
On the other hand, the counted deaths are due to infections that happened weeks ago.
This means that, as long as the contagius cases increase, the calculated mortality
is under-estimated.

8.2. Top Ten Countries

In [140]:
# The top 10 Countries by number of confirmed cases in descending order
conf_top_10 = daily_rep_group.sort_values(by ='Confirmed', ascending = False).\
              head(10)['Confirmed']
In [141]:
# Showing the top 10 Countries by number of confirmed cases in a bar plot
plot_cust_hbar(conf_top_10.sort_values(),
               figsize_w=16, figsize_h=12,
               frame=False, grid=False,
               ref_font_size=14,
               title_text="Countries by number of confirmed cases "\
                          "in descending order (top 10)",
               title_offset=20,
               color_numb=0,
               categ_labels=True,
               labels=None,
               rot=0,
               show_values=True,
               omitted_value=0,
               percent=False,
               center_al=True,
               visible_digits=2)
In [142]:
# The top 10 Countries by number of recovered cases in descending order
recov_top_10 = daily_rep_group.sort_values(by ='Recovered', ascending = False).\
               head(10)['Recovered']
In [143]:
# Showing the top 10 Countries by number of recovered cases in a bar plot
plot_cust_hbar(recov_top_10.sort_values(),
               figsize_w=16, figsize_h=12,
               frame=False, grid=False,
               ref_font_size=14,
               title_text="Countries by number of recovered cases "\
                          "in descending order (top 10)",
               title_offset=20,
               color_numb=2,
               categ_labels=True,
               labels=None,
               rot=0,
               show_values=True,
               omitted_value=0,
               percent=False,
               center_al=True,
               visible_digits=2)
In [144]:
# The top 10 Countries by number of deceased cases in descending order
deceas_top_10 = daily_rep_group.sort_values(by ='Deaths', ascending = False).\
                head(10)['Deaths']
In [145]:
# Showing the top 10 Countries by number of deceased cases in a bar plot
plot_cust_hbar(deceas_top_10.sort_values(),
               figsize_w=16, figsize_h=12,
               frame=False, grid=False,
               ref_font_size=14,
               title_text="Countries by number of deceased cases "\
                          "in descending order (top 10)",
               title_offset=20,
               color_numb=3,
               categ_labels=True,
               labels=None,
               rot=0,
               show_values=True,
               omitted_value=0,
               percent=False,
               center_al=True,
               visible_digits=2)
In [146]:
# The top 10 Countries by number of active cases in descending order
act_top_10 = daily_rep_group.sort_values(by ='Active', ascending = False).\
             head(10)['Active']
In [147]:
# Showing the top 10 Countries by number of active cases in a bar plot
plot_cust_hbar(act_top_10.sort_values(),
               figsize_w=16, figsize_h=12,
               frame=False, grid=False,
               ref_font_size=14,
               title_text="Countries by number of active cases "\
                          "in descending order (top 10)",
               title_offset=20,
               color_numb=1,
               categ_labels=True,
               labels=None,
               rot=0,
               show_values=True,
               omitted_value=0,
               percent=False,
               center_al=False,
               visible_digits=2)
In [148]:
print("\n(*) Note that for certain Countries the figures in the previous three tables",
      "contain also off shore territories.")
print("For example, for France the numbers include:\n\n",
      "- French Polynesia\n",
      "- New caledonia\n",
      "- St Martina\n",
      "- Saint Barthelemyia\n",
      "- French Guiana\n",
      "- Guadelupe\n",
      "- Mayotte\n",
      "- Reunion\n")
(*) Note that for certain Countries the figures in the previous three tables contain also off shore territories.
For example, for France the numbers include:

 - French Polynesia
 - New caledonia
 - St Martina
 - Saint Barthelemyia
 - French Guiana
 - Guadelupe
 - Mayotte
 - Reunion

8.3. Finland

In [149]:
# Visualizing the current status in Finland
print("Latest situation in Finland:\n")
print(daily_rep_group.loc['Finland'].to_string())
# Confirmed cases in percentage of the total population
cont_perc_fin = daily_rep_group.loc['Finland'][0]/(5.513*1000000)*100
print("\nConfirmed cases in percentage of the total population:")
print("{:.2f} %".format(cont_perc_fin))
Latest situation in Finland:

Confirmed    7351
Recovered    6880
Deaths        328
Active        143

Confirmed cases in percentage of the total population:
0.13 %
In [150]:
# Mortality (Finland)
mort_fin = (daily_rep_group.loc['Finland'][2]/daily_rep_group.loc['Finland'][0])*100
print("'Calculated' mortality in Finland (Case Fatality Rate): {:.2f} %\n".format(mort_fin))
print("IMPORTANT NOTE:\nThe actual mortality (Infection Fatality Rate) could be much lower",
      "due to the fact that\nnot all infected people have been tested!\n"
      "On the other hand, the counted deaths are due to infections that happened",
      "weeks ago.\nThis means that, as long as the contagius cases increase, "
      "the calculated mortality\nis under-estimated.")
'Calculated' mortality in Finland (Case Fatality Rate): 4.46 %

IMPORTANT NOTE:
The actual mortality (Infection Fatality Rate) could be much lower due to the fact that
not all infected people have been tested!
On the other hand, the counted deaths are due to infections that happened weeks ago.
This means that, as long as the contagius cases increase, the calculated mortality
is under-estimated.

9. Conclusions

Currently, the number of recorded COVID-19 cases is about 0.19% of the world population and has produced already more deaths than seasonal flu worldwide. The number of new confirmed cases is growing exponentially.

Even though the virus originated from China, it has spread west to Europe and then further west to US and South America.

The first wave has been over in China around the end of winter and in most of Europe around the end of spring.

Currently, most of the confirmed cases are in US, followed by Brazil, India and Russia. UK has also a quite high number of active cases. In China a second wave has followed and at the beginning of the summer there are signs of a possible third wave.

Unfortunately, the Finnish Institute for Health and Welfare (THL) does not seem to keep a complete public API for uploading daily the time series. In particular, there is no reliable estimate of the number of recovered cases and therefore it is not possible to get a reliable curve for the active cases, which is actually the most important curve to follow the evolution of the epidemic.

The confirmed reported cases are about 0.13% of the Finnish population. The Finnish curve of the confirmed cases is currently in a slowing down growing phase. The confirmed cases curve is the lowest pro capita in Scandinavia and is one of the lowest in Europe. The same applies to the cumulative deceased cases, suggesting that the low curve in the cumulative confirmed cases might not be due to a too relaxed testing policy. The case fatality rate is slightly higher than the world average.

Even though the actual percentage of people that have been in contact with the virus is certainly higher, it should be noted that such low numbers suggest that the immunity in Finland is at very low levels currently (there is still a quite high percentage of susceptible people).

There might be different reasons for the relative difficulty of the virus to spread in Finland, including remote geographical location, low population density, low level of pollution, culture and local practices (as keeping physical distances when greeting, spending a lot of time outdoor in Nature and vising sauna frequently) and prompt containment actions.

In a further study it would be interesting to verify those assumptions scientifically.

10. Acknowledgements

Many thanks to Johns Hokpins University for sharing and maintaining daily the source csv files.

Many thanks to Coursera for providing a very informative course.

Many thanks to colleagues and friends who have contributed by providing links and comments.


In [151]:
print("Last plotted day:", dt.datetime.strptime(last_day, "%m-%d-%Y").\
      date().strftime("%d-%b-%Y"))
end_time = dt.datetime.utcnow()
script_duration = end_time - start_time
print("\nRunning time for the full script (hh:mm:ss):", script_duration)
Last plotted day: 21-Jul-2020

Running time for the full script (hh:mm:ss): 0:02:23.686926

Used software:
- Jupyter Notebook server 6.0.1
- Python 3.6.8
- numpy 1.18.2
- pandas 1.0.3
- matplotlib 3.1.2
- seaborn 0.9.0
- regex 2019.8.19
on top of Linux Ubuntu 18.04